<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Distributed Training on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/distributed-training/</link><description>Recent content in Distributed Training on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 30 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/distributed-training/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 9: Distributed Training and Scaling with Tunix</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/09-distributed-training/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/09-distributed-training/</guid><description>&lt;h2 id="chapter-9-distributed-training-and-scaling-with-tunix"&gt;Chapter 9: Distributed Training and Scaling with Tunix&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid Tunix explorer! So far, we&amp;rsquo;ve mastered the fundamentals of Tunix, understood its core concepts, and even applied it to fine-tune smaller language models. But what happens when our models grow to billions or even trillions of parameters? What happens when our datasets are so massive that a single GPU or even a single machine can&amp;rsquo;t handle them?&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s where distributed training comes in! In this chapter, we&amp;rsquo;re going to dive into the exciting world of scaling our LLM post-training efforts. We&amp;rsquo;ll learn how Tunix, powered by JAX, allows us to harness the power of multiple devices – whether they&amp;rsquo;re GPUs or TPUs – to train larger models faster and more efficiently.&lt;/p&gt;</description></item><item><title>Chapter 17: Distributed Training &amp;amp; Scaling Deep Learning</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/distributed-training/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/distributed-training/</guid><description>&lt;h2 id="chapter-17-distributed-training--scaling-deep-learning"&gt;Chapter 17: Distributed Training &amp;amp; Scaling Deep Learning&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architect! In our journey so far, we&amp;rsquo;ve built a strong foundation in deep learning, mastering neural network architectures, understanding training workflows, and optimizing models. We&amp;rsquo;ve even considered how powerful hardware like GPUs accelerate our tasks. But what happens when your model becomes so massive it won&amp;rsquo;t fit on a single GPU? Or when your dataset is so enormous that training takes weeks, even on the most powerful single machine?&lt;/p&gt;</description></item></channel></rss>