<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Big Data on AI VOID</title><link>https://ai-blog.noorshomelab.dev/categories/big-data/</link><description>Recent content in Big Data on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 26 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/categories/big-data/index.xml" rel="self" type="application/rss+xml"/><item><title>Introduction to Apache Spark on Databricks</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/introduction-apache-spark/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/introduction-apache-spark/</guid><description>&lt;h2 id="introduction-to-apache-spark-on-databricks"&gt;Introduction to Apache Spark on Databricks&lt;/h2&gt;
&lt;p&gt;Welcome back, aspiring data wizard! In our previous chapters, you&amp;rsquo;ve taken your first steps into the Databricks Lakehouse Platform, getting comfortable with its environment and setting up your workspace. Now, it&amp;rsquo;s time to dive into the heart of what makes Databricks so powerful for big data: &lt;strong&gt;Apache Spark&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This chapter will introduce you to the fundamental concepts of Apache Spark, explaining why it&amp;rsquo;s the go-to engine for large-scale data processing and how Databricks supercharges it. We&amp;rsquo;ll explore Spark&amp;rsquo;s core abstractions, understand its architecture, and, most importantly, get our hands dirty writing our first Spark code in a Databricks notebook. Get ready to unlock the true potential of distributed computing!&lt;/p&gt;</description></item><item><title>Data Transformation with PySpark DataFrames</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-transformation-pyspark/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-transformation-pyspark/</guid><description>&lt;h2 id="introduction-to-data-transformation-with-pyspark-dataframes"&gt;Introduction to Data Transformation with PySpark DataFrames&lt;/h2&gt;
&lt;p&gt;Welcome back, data adventurers! In our previous chapters, we learned how to get around Databricks, set up our environment, and even load some data. But what good is raw data if we can&amp;rsquo;t make sense of it, clean it up, or reshape it to answer critical questions? This is where the magic of data transformation comes comes in, and PySpark DataFrames are our trusty wands!&lt;/p&gt;</description></item><item><title>Real-time Data with Structured Streaming</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/structured-streaming/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/structured-streaming/</guid><description>&lt;h2 id="introduction-the-pulse-of-real-time-data"&gt;Introduction: The Pulse of Real-time Data&lt;/h2&gt;
&lt;p&gt;Welcome to Chapter 8! So far, we&amp;rsquo;ve mastered processing vast amounts of historical data using Spark DataFrames, transforming and analyzing it at scale. But what if your data isn&amp;rsquo;t static? What if new information arrives constantly, and you need to react to it &lt;em&gt;now&lt;/em&gt;? Think about monitoring sensor data, tracking website clicks, or processing financial transactions as they happen. This is where the magic of real-time data processing comes in!&lt;/p&gt;</description></item><item><title>Parallel Compression and Distributed Systems</title><link>https://ai-blog.noorshomelab.dev/openzl-mastery-2026/parallel-compression-distributed-systems/</link><pubDate>Mon, 26 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/openzl-mastery-2026/parallel-compression-distributed-systems/</guid><description>&lt;h2 id="introduction-to-parallel-compression-and-distributed-systems-with-openzl"&gt;Introduction to Parallel Compression and Distributed Systems with OpenZL&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid data explorer! In our journey through the fascinating world of OpenZL, we&amp;rsquo;ve learned how to craft intelligent compression plans and apply them to various data formats. But what happens when your data isn&amp;rsquo;t just large, but &lt;em&gt;enormous&lt;/em&gt;? What if it resides across many machines in a vast data lake? That&amp;rsquo;s where the power of parallel compression and distributed systems comes into play.&lt;/p&gt;</description></item></channel></rss>