<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Apache Spark on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/apache-spark/</link><description>Recent content in Apache Spark on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 20 Dec 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/apache-spark/index.xml" rel="self" type="application/rss+xml"/><item><title>Getting Started with Your Databricks Workspace</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/getting-started-workspace/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/getting-started-workspace/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome, aspiring data wizard! In this exciting first chapter, we&amp;rsquo;re going to embark on our journey into the powerful world of Databricks. Think of this as your grand tour of the Databricks &amp;ldquo;command center&amp;rdquo; – your workspace. We&amp;rsquo;ll start from the absolute basics, ensuring you feel comfortable and confident navigating this platform.&lt;/p&gt;
&lt;p&gt;By the end of this chapter, you&amp;rsquo;ll know how to access your Databricks workspace, understand its fundamental components like clusters and notebooks, and even run your very first piece of code. This foundational knowledge is crucial because the Databricks workspace is where all your data engineering, machine learning, and analytics magic happens. It&amp;rsquo;s the launchpad for every project we&amp;rsquo;ll build together!&lt;/p&gt;</description></item><item><title>Introduction to Apache Spark on Databricks</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/introduction-apache-spark/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/introduction-apache-spark/</guid><description>&lt;h2 id="introduction-to-apache-spark-on-databricks"&gt;Introduction to Apache Spark on Databricks&lt;/h2&gt;
&lt;p&gt;Welcome back, aspiring data wizard! In our previous chapters, you&amp;rsquo;ve taken your first steps into the Databricks Lakehouse Platform, getting comfortable with its environment and setting up your workspace. Now, it&amp;rsquo;s time to dive into the heart of what makes Databricks so powerful for big data: &lt;strong&gt;Apache Spark&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This chapter will introduce you to the fundamental concepts of Apache Spark, explaining why it&amp;rsquo;s the go-to engine for large-scale data processing and how Databricks supercharges it. We&amp;rsquo;ll explore Spark&amp;rsquo;s core abstractions, understand its architecture, and, most importantly, get our hands dirty writing our first Spark code in a Databricks notebook. Get ready to unlock the true potential of distributed computing!&lt;/p&gt;</description></item><item><title>Data Transformation with PySpark DataFrames</title><link>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-transformation-pyspark/</link><pubDate>Fri, 19 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/databricks-mastery-2025/data-transformation-pyspark/</guid><description>&lt;h2 id="introduction-to-data-transformation-with-pyspark-dataframes"&gt;Introduction to Data Transformation with PySpark DataFrames&lt;/h2&gt;
&lt;p&gt;Welcome back, data adventurers! In our previous chapters, we learned how to get around Databricks, set up our environment, and even load some data. But what good is raw data if we can&amp;rsquo;t make sense of it, clean it up, or reshape it to answer critical questions? This is where the magic of data transformation comes comes in, and PySpark DataFrames are our trusty wands!&lt;/p&gt;</description></item><item><title>Streaming Logistics Cost Monitoring with Spark Structured Streaming</title><link>https://ai-blog.noorshomelab.dev/realtime-supply-chain-intelligence-2/08-structured-streaming-cost-monitoring/</link><pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/realtime-supply-chain-intelligence-2/08-structured-streaming-cost-monitoring/</guid><description>&lt;h2 id="streaming-logistics-cost-monitoring-with-spark-structured-streaming"&gt;Streaming Logistics Cost Monitoring with Spark Structured Streaming&lt;/h2&gt;
&lt;h3 id="1-chapter-introduction"&gt;1. Chapter Introduction&lt;/h3&gt;
&lt;p&gt;In modern supply chains, real-time visibility into logistics costs is paramount for effective decision-making, cost optimization, and competitive advantage. This chapter guides you through building a robust, real-time logistics cost monitoring pipeline using Apache Spark Structured Streaming on Databricks. We will ingest streaming logistics events from Kafka, process them to calculate various cost components, and enrich them with previously generated tariff data and dynamic fuel prices.&lt;/p&gt;</description></item><item><title>Streaming Logistics Cost Monitoring with Spark Structured Streaming</title><link>https://ai-blog.noorshomelab.dev/realtime-supply-chain-intelligence/08-structured-streaming-cost-monitoring/</link><pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/realtime-supply-chain-intelligence/08-structured-streaming-cost-monitoring/</guid><description>&lt;h2 id="streaming-logistics-cost-monitoring-with-spark-structured-streaming"&gt;Streaming Logistics Cost Monitoring with Spark Structured Streaming&lt;/h2&gt;
&lt;h3 id="1-chapter-introduction"&gt;1. Chapter Introduction&lt;/h3&gt;
&lt;p&gt;In modern supply chains, real-time visibility into logistics costs is paramount for effective decision-making, cost optimization, and competitive advantage. This chapter guides you through building a robust, real-time logistics cost monitoring pipeline using Apache Spark Structured Streaming on Databricks. We will ingest streaming logistics events from Kafka, process them to calculate various cost components, and enrich them with previously generated tariff data and dynamic fuel prices.&lt;/p&gt;</description></item></channel></rss>