<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Pipelines on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/data-pipelines/</link><description>Recent content in Data Pipelines on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 20 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/data-pipelines/index.xml" rel="self" type="application/rss+xml"/><item><title>Building Robust Pipelines: From Ingestion to Vectorization</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/building-robust-pipelines-ingestion-vectorization/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/building-robust-pipelines-ingestion-vectorization/</guid><description>&lt;h2 id="introduction-to-multimodal-data-pipelines"&gt;Introduction to Multimodal Data Pipelines&lt;/h2&gt;
&lt;p&gt;Welcome back, future multimodal AI architects! In previous chapters, we laid the groundwork for understanding what multimodal AI is and why it&amp;rsquo;s so powerful. We&amp;rsquo;ve talked about the magic of combining different types of data – text, images, audio, and video – to build more intelligent and nuanced systems. But how does this raw, diverse data actually get transformed into something our sophisticated AI models can understand and process?&lt;/p&gt;</description></item><item><title>Chapter 9: Integrating OpenZL into Data Pipelines</title><link>https://ai-blog.noorshomelab.dev/openzl-mastery-2026/09-integrating-openzl/</link><pubDate>Mon, 26 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/openzl-mastery-2026/09-integrating-openzl/</guid><description>&lt;h2 id="chapter-9-integrating-openzl-into-data-pipelines"&gt;Chapter 9: Integrating OpenZL into Data Pipelines&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid data explorer! In our previous chapters, we&amp;rsquo;ve unpacked the &amp;ldquo;what&amp;rdquo; and &amp;ldquo;why&amp;rdquo; of OpenZL, explored its unique graph-based approach, and even got it set up in our development environment. Now, it&amp;rsquo;s time to bridge the gap between theory and practice. This chapter is all about the &amp;ldquo;how&amp;rdquo;: how do we actually weave OpenZL into our existing data workflows and pipelines?&lt;/p&gt;</description></item><item><title>Chapter 11: AI-Powered Systems: Debugging Models &amp;amp; Data Pipelines</title><link>https://ai-blog.noorshomelab.dev/real-world-software-problem-solving-guide/debugging-ai-systems/</link><pubDate>Fri, 06 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/real-world-software-problem-solving-guide/debugging-ai-systems/</guid><description>&lt;h2 id="chapter-11-ai-powered-systems-debugging-models--data-pipelines"&gt;Chapter 11: AI-Powered Systems: Debugging Models &amp;amp; Data Pipelines&lt;/h2&gt;
&lt;p&gt;Welcome to Chapter 11! So far, we&amp;rsquo;ve honed our problem-solving skills across traditional software stacks, from frontend quirks to distributed backend woes. Now, it&amp;rsquo;s time to tackle one of the most exciting, yet challenging, frontiers in modern engineering: &lt;strong&gt;AI-powered systems&lt;/strong&gt;. Debugging these systems introduces a whole new dimension of complexity, blending traditional software issues with statistical uncertainties, data dependencies, and the sometimes-mysterious behavior of machine learning models.&lt;/p&gt;</description></item><item><title>Monitoring &amp;amp; Observability for Data Pipelines</title><link>https://ai-blog.noorshomelab.dev/metadataflow-guide-2026/12-monitoring-observability/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/metadataflow-guide-2026/12-monitoring-observability/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome back, aspiring data wizards! In the previous chapters, we&amp;rsquo;ve explored how Meta AI&amp;rsquo;s powerful, open-source machine learning library helps us manage and transform datasets, laying a robust foundation for our ML projects. But what happens once our data pipelines are up and running? How do we ensure they continue to deliver high-quality, reliable data day in and day out?&lt;/p&gt;
&lt;p&gt;This chapter dives into the crucial world of &lt;strong&gt;Monitoring &amp;amp; Observability&lt;/strong&gt; for your data pipelines. You&amp;rsquo;ll learn why keeping a close eye on your data&amp;rsquo;s journey is non-negotiable, understand the key concepts that make your pipelines &amp;ldquo;observable,&amp;rdquo; and discover practical ways to implement monitoring solutions. By the end, you&amp;rsquo;ll be equipped to build resilient data systems that proactively alert you to issues, ensuring the integrity and performance of your machine learning models. We&amp;rsquo;ll assume you&amp;rsquo;re familiar with basic Python programming and the concepts of data pipelines as covered in earlier chapters.&lt;/p&gt;</description></item><item><title>16. Project: Data Pipeline Testing with Python (Kafka &amp;amp; DB)</title><link>https://ai-blog.noorshomelab.dev/testcontainers-mastery-2026/16-project-data-pipeline-python/</link><pubDate>Sat, 14 Feb 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/testcontainers-mastery-2026/16-project-data-pipeline-python/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid tester! So far, we&amp;rsquo;ve explored the foundational concepts of Testcontainers and used them to test single-service applications in various languages. But what about testing more complex systems, like the beating heart of many modern applications: a data pipeline?&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;re going to tackle a real-world scenario: building and testing a simplified data pipeline in Python. This pipeline will involve two crucial external services: Apache Kafka for message queuing and PostgreSQL for data storage. Testing such a system traditionally is a headache, requiring manual setup of these services, which leads to flaky, slow, and inconsistent tests. Thankfully, Testcontainers comes to our rescue! We&amp;rsquo;ll use &lt;code&gt;testcontainers-python&lt;/code&gt; to spin up fresh, isolated instances of both Kafka and PostgreSQL for every test run, ensuring your tests are reliable and fast.&lt;/p&gt;</description></item></channel></rss>