<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Ingestion on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/data-ingestion/</link><description>Recent content in Data Ingestion on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 28 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/data-ingestion/index.xml" rel="self" type="application/rss+xml"/><item><title>Data Ingestion: Connecting to Diverse Sources</title><link>https://ai-blog.noorshomelab.dev/metadataflow-guide-2026/03-data-ingestion-sources/</link><pubDate>Wed, 28 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/metadataflow-guide-2026/03-data-ingestion-sources/</guid><description>&lt;h2 id="introduction-to-data-ingestion"&gt;Introduction to Data Ingestion&lt;/h2&gt;
&lt;p&gt;Welcome back, aspiring data magician! In the previous chapters, we laid the groundwork by understanding the core philosophy of Meta AI&amp;rsquo;s new open-source library for dataset management and got our development environment ready. Now, it&amp;rsquo;s time to get our hands dirty with the lifeblood of any machine learning project: &lt;strong&gt;data&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This chapter focuses on &lt;strong&gt;data ingestion&lt;/strong&gt; – the crucial process of bringing data from various external sources into our Meta AI dataset management library. Think of it as opening the floodgates to all the valuable information your models will learn from. We&amp;rsquo;ll explore how to connect to diverse data sources, from local files to robust databases and external APIs, ensuring your projects are always fueled with fresh, relevant data. Mastering data ingestion is not just about moving files; it&amp;rsquo;s about setting up robust, repeatable pipelines that can adapt to the ever-changing landscape of data sources. By the end of this chapter, you&amp;rsquo;ll be confidently pulling data into your &lt;code&gt;Dataset&lt;/code&gt; objects, ready for the next steps in your ML journey!&lt;/p&gt;</description></item><item><title>Data Ingestion: Loading Data into Databricks</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-ingestion/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-ingestion/</guid><description>&lt;h2 id="data-ingestion-loading-data-into-databricks"&gt;Data Ingestion: Loading Data into Databricks&lt;/h2&gt;
&lt;p&gt;Welcome back, future data wizard! In the previous chapters, you&amp;rsquo;ve taken your first steps into the Databricks world, understanding its core components like workspaces and clusters. You&amp;rsquo;ve even run some basic commands, which is fantastic! Now that your Databricks environment is purring like a happy kitten, it&amp;rsquo;s time for a crucial next step: getting data &lt;em&gt;into&lt;/em&gt; it.&lt;/p&gt;
&lt;p&gt;This chapter is all about &lt;strong&gt;data ingestion&lt;/strong&gt;. Think of it as opening the doors to your Databricks data factory and letting the raw materials pour in. We&amp;rsquo;ll explore various ways to load data, from simple files to more robust, production-ready methods. By the end, you&amp;rsquo;ll not only know &lt;em&gt;how&lt;/em&gt; to ingest data but also &lt;em&gt;why&lt;/em&gt; certain methods are preferred for different scenarios, setting you up for success in handling real-world datasets.&lt;/p&gt;</description></item></channel></rss>