<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep Learning on AI VOID</title><link>https://ai-blog.noorshomelab.dev/categories/deep-learning/</link><description>Recent content in Deep Learning on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 20 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/categories/deep-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 1: The World of LLM Post-Training and Tunix</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/01-introduction-to-tunix/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/01-introduction-to-tunix/</guid><description>&lt;p&gt;Welcome, aspiring AI architect! In this guide, we&amp;rsquo;re embarking on an exciting journey to master &lt;strong&gt;Tunix&lt;/strong&gt;, a powerful JAX-native library specifically designed for the crucial task of Large Language Model (LLM) post-training. By the end of this comprehensive series, you&amp;rsquo;ll not only understand Tunix inside and out but also be able to apply it to real-world LLM alignment and specialization challenges.&lt;/p&gt;
&lt;p&gt;In this inaugural chapter, we&amp;rsquo;ll lay the groundwork. We&amp;rsquo;ll start by demystifying LLM post-training itself – what it is, why it&amp;rsquo;s indispensable, and how it transforms general-purpose models into highly capable, aligned assistants. Then, we&amp;rsquo;ll introduce you to Tunix, explaining its core purpose and the unique advantages it brings to the table, particularly through its integration with JAX. Finally, we&amp;rsquo;ll guide you through setting up your development environment, ensuring you&amp;rsquo;re ready to dive into hands-on coding from the very next chapter.&lt;/p&gt;</description></item><item><title>Chapter 3: JAX Essentials for Tunix Users</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/03-jax-essentials/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/03-jax-essentials/</guid><description>&lt;h2 id="chapter-3-jax-essentials-for-tunix-users"&gt;Chapter 3: JAX Essentials for Tunix Users&lt;/h2&gt;
&lt;p&gt;Welcome back, future LLM masters! In Chapter 2, we got our environment ready and took a peek at what Tunix offers. Now, it&amp;rsquo;s time to dig into the engine that powers Tunix: JAX. Think of JAX as the high-performance sports car engine, and Tunix as the sleek, specialized body built around it for LLM post-training. To truly drive Tunix effectively, you need to understand how its engine works!&lt;/p&gt;</description></item><item><title>TensorFlow Guide: Building Your First Neural Network with Keras</title><link>https://ai-blog.noorshomelab.dev/tensorflow-guide/building-your-first-neural-network-with-keras/</link><pubDate>Sun, 26 Oct 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tensorflow-guide/building-your-first-neural-network-with-keras/</guid><description>&lt;h2 id="3-building-your-first-neural-network-with-keras"&gt;3. Building Your First Neural Network with Keras&lt;/h2&gt;
&lt;p&gt;Keras is a high-level API for building and training deep learning models, fully integrated into TensorFlow (&lt;code&gt;tf.keras&lt;/code&gt;). It&amp;rsquo;s designed for fast experimentation and ease of use, making it perfect for beginners. In this chapter, you&amp;rsquo;ll learn how to build, compile, and train your first neural networks using Keras.&lt;/p&gt;
&lt;h3 id="31-understanding-neural-network-basics"&gt;3.1 Understanding Neural Network Basics&lt;/h3&gt;
&lt;p&gt;Before we build, let&amp;rsquo;s briefly revisit what a neural network is at a high level:&lt;/p&gt;</description></item><item><title>Weaving Information: Data Fusion Strategies</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/weaving-information-data-fusion-strategies/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/weaving-information-data-fusion-strategies/</guid><description>&lt;h2 id="introduction-the-art-of-combination"&gt;Introduction: The Art of Combination&lt;/h2&gt;
&lt;p&gt;Welcome back, fellow AI explorer! In our previous chapters, we embarked on a fascinating journey, learning how to process individual modalities like text, images, audio, and video, transforming them into meaningful numerical representations, or &lt;em&gt;embeddings&lt;/em&gt;. We saw how powerful these individual encoders can be, but here&amp;rsquo;s a thought: what if we could combine these different perspectives? What if an AI could not just &lt;em&gt;see&lt;/em&gt; an image, but also &lt;em&gt;read&lt;/em&gt; its caption, &lt;em&gt;hear&lt;/em&gt; the accompanying audio, and &lt;em&gt;understand&lt;/em&gt; the context of a video clip, all at once?&lt;/p&gt;</description></item><item><title>TensorFlow Guide: Intermediate Topics - Custom Training Loops and Callbacks</title><link>https://ai-blog.noorshomelab.dev/tensorflow-guide/intermediate-tensorflow-custom-training-loops-callbacks/</link><pubDate>Sun, 26 Oct 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tensorflow-guide/intermediate-tensorflow-custom-training-loops-callbacks/</guid><description>&lt;h2 id="5-intermediate-topics"&gt;5. Intermediate Topics&lt;/h2&gt;
&lt;p&gt;While &lt;code&gt;model.fit()&lt;/code&gt; is incredibly convenient, sometimes you need more control over the training process. This chapter introduces two powerful intermediate topics: &lt;strong&gt;Custom Training Loops&lt;/strong&gt; for ultimate flexibility and &lt;strong&gt;Keras Callbacks&lt;/strong&gt; for customizing &lt;code&gt;model.fit()&lt;/code&gt; behavior.&lt;/p&gt;
&lt;h3 id="51-custom-training-loops-with-tfgradienttape"&gt;5.1 Custom Training Loops with &lt;code&gt;tf.GradientTape&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;A custom training loop gives you full control over every aspect of the training process, from calculating gradients to updating model weights. This is particularly useful for:&lt;/p&gt;</description></item><item><title>Chapter 6: Understanding Tunix Model Architectures and State Management</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/06-model-architecture/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/06-model-architecture/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome back, future LLM expert! In our previous chapters, we laid the groundwork by setting up Tunix and understanding its core philosophy. Now, it&amp;rsquo;s time to peek under the hood and explore how Tunix, built on the powerful JAX ecosystem, handles the intricate dance of model architectures and their ever-evolving state.&lt;/p&gt;
&lt;p&gt;Understanding how your Large Language Model (LLM) is represented and how its parameters (the &amp;ldquo;knowledge&amp;rdquo; it holds) are managed is absolutely crucial for effective post-training. Unlike traditional imperative frameworks where model state might be implicitly updated, JAX operates on a functional paradigm. This means state management is explicit, predictable, and incredibly powerful when you know how to wield it. Tunix leverages this power, often integrating with libraries like Flax NNX, to give you granular control over your LLM&amp;rsquo;s internal workings.&lt;/p&gt;</description></item><item><title>TensorFlow Guide: Guided Project 1 - Image Classification with CNNs</title><link>https://ai-blog.noorshomelab.dev/tensorflow-guide/guided-project-1-image-classification-with-cnns/</link><pubDate>Sun, 26 Oct 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tensorflow-guide/guided-project-1-image-classification-with-cnns/</guid><description>&lt;h2 id="7-guided-project-1-image-classification-with-cnns"&gt;7. Guided Project 1: Image Classification with CNNs&lt;/h2&gt;
&lt;p&gt;This project will guide you through building a Convolutional Neural Network (CNN) to classify images from the CIFAR-10 dataset. CIFAR-10 consists of 60,000 32x32 color images in 10 classes (e.g., airplane, automobile, bird, cat). This project will solidify your understanding of data pipelines, model building with Keras, and training strategies.&lt;/p&gt;
&lt;h3 id="project-objective"&gt;Project Objective&lt;/h3&gt;
&lt;p&gt;Build and train a CNN model capable of classifying CIFAR-10 images with reasonable accuracy.&lt;/p&gt;</description></item><item><title>Chapter 8: Recurrent Neural Networks (RNNs) for Sequence Data</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/recurrent-neural-networks/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/recurrent-neural-networks/</guid><description>&lt;h2 id="chapter-8-recurrent-neural-networks-rnns-for-sequence-data"&gt;Chapter 8: Recurrent Neural Networks (RNNs) for Sequence Data&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI engineer! In our previous chapters, we mastered the fundamentals of deep learning with feedforward neural networks (FNNs). We learned how these networks excel at tasks where inputs are independent and fixed in size, like classifying images or predicting a single value from a structured dataset.&lt;/p&gt;
&lt;p&gt;But what happens when the order of your data matters? What if your input isn&amp;rsquo;t a single, fixed-size vector, but a sequence of varying length, where each element&amp;rsquo;s meaning is influenced by what came before it? Think about natural language, where the meaning of a word depends on the preceding words, or time series data, where future values are influenced by past observations. Traditional FNNs hit a wall here because they lack &amp;ldquo;memory&amp;rdquo; and treat each input independently.&lt;/p&gt;</description></item><item><title>TensorFlow Guide: Guided Project 2 - Text Generation with LSTMs</title><link>https://ai-blog.noorshomelab.dev/tensorflow-guide/guided-project-2-text-generation-with-lstms/</link><pubDate>Sun, 26 Oct 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tensorflow-guide/guided-project-2-text-generation-with-lstms/</guid><description>&lt;h2 id="8-guided-project-2-text-generation-with-lstms"&gt;8. Guided Project 2: Text Generation with LSTMs&lt;/h2&gt;
&lt;p&gt;In this project, you&amp;rsquo;ll build a character-level text generation model using Long Short-Term Memory (LSTM) networks, a type of Recurrent Neural Network (RNN). The model will learn patterns in text and then be able to generate new sequences of characters, essentially writing new &amp;ldquo;sentences&amp;rdquo; based on what it learned.&lt;/p&gt;
&lt;h3 id="project-objective"&gt;Project Objective&lt;/h3&gt;
&lt;p&gt;Build an LSTM-based model to generate creative text, trained on a classic text dataset. We&amp;rsquo;ll use a portion of Shakespeare&amp;rsquo;s works.&lt;/p&gt;</description></item><item><title>Real-Time Multimodal AI: Optimizing for Speed and Latency</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/real-time-multimodal-ai-optimizing-speed-latency/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/real-time-multimodal-ai-optimizing-speed-latency/</guid><description>&lt;h2 id="introduction-to-real-time-multimodal-ai"&gt;Introduction to Real-Time Multimodal AI&lt;/h2&gt;
&lt;p&gt;Welcome back, fellow AI adventurer! In our journey through multimodal AI, we&amp;rsquo;ve explored how different data types—text, images, audio, and video—can be brought together to create richer, more intelligent systems. We&amp;rsquo;ve seen how these modalities are represented, fused, and processed by powerful models like Multimodal Large Language Models (MLLMs).&lt;/p&gt;
&lt;p&gt;But what happens when these systems need to make decisions or respond &lt;em&gt;instantly&lt;/em&gt;? Imagine a self-driving car that takes seconds to process a pedestrian, or a voice assistant that lags several seconds behind your speech. In many real-world applications, speed isn&amp;rsquo;t just a feature; it&amp;rsquo;s a fundamental requirement. This is where &lt;strong&gt;real-time multimodal AI&lt;/strong&gt; comes into play.&lt;/p&gt;</description></item><item><title>Chapter 11: Customizing Tunix: Loss Functions, Optimizers, and Callbacks</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/11-customization/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/11-customization/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome to Chapter 11! So far, you&amp;rsquo;ve mastered the fundamentals of setting up Tunix, loading models, and initiating basic post-training runs. But what if the standard tools aren&amp;rsquo;t quite enough for your specific research or application? What if you need to guide your Language Model (LLM) with a unique objective, fine-tune its learning process with a specialized algorithm, or automate complex actions during training?&lt;/p&gt;
&lt;p&gt;This chapter is your gateway to unlocking the full power of Tunix customization. We&amp;rsquo;ll dive deep into how you can define and integrate your own loss functions to precisely shape your LLM&amp;rsquo;s learning objective, craft sophisticated optimizers using JAX&amp;rsquo;s powerful Optax library to control parameter updates, and implement intelligent callbacks to monitor, control, and react to your training process. By the end of this chapter, you&amp;rsquo;ll be able to tailor Tunix to virtually any LLM post-training scenario, moving beyond off-the-shelf solutions to truly bespoke training pipelines.&lt;/p&gt;</description></item><item><title>Chapter 12: Multimodal Models: Vision-Language Integration</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/multimodal-models/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/multimodal-models/</guid><description>&lt;h2 id="chapter-12-multimodal-models-vision-language-integration"&gt;Chapter 12: Multimodal Models: Vision-Language Integration&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architect! In our journey so far, we&amp;rsquo;ve explored the depths of neural networks, mastered the art of training deep learning models, and even fine-tuned powerful Large Language Models (LLMs). Each step has brought us closer to building truly intelligent systems. But what if we want our AI to do more than just understand text or analyze images in isolation? What if we want it to &lt;em&gt;see&lt;/em&gt; and &lt;em&gt;understand&lt;/em&gt; the world, like humans do, by combining different senses?&lt;/p&gt;</description></item><item><title>Chapter 14: Model Training Workflows &amp;amp; Optimization Techniques</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/training-workflows-optimization/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/training-workflows-optimization/</guid><description>&lt;h2 id="introduction-to-model-training-workflows--optimization"&gt;Introduction to Model Training Workflows &amp;amp; Optimization&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI engineer! In the previous chapters, we laid the groundwork by understanding the mathematical foundations of AI, classic machine learning algorithms, and delving into the fascinating world of neural networks and their diverse architectures. You&amp;rsquo;ve learned how to construct these powerful models. But a model, no matter how well-designed, is useless until it learns from data. That&amp;rsquo;s where &lt;strong&gt;model training workflows&lt;/strong&gt; come in.&lt;/p&gt;</description></item><item><title>Chapter 17: Distributed Training &amp;amp; Scaling Deep Learning</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/distributed-training/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/distributed-training/</guid><description>&lt;h2 id="chapter-17-distributed-training--scaling-deep-learning"&gt;Chapter 17: Distributed Training &amp;amp; Scaling Deep Learning&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architect! In our journey so far, we&amp;rsquo;ve built a strong foundation in deep learning, mastering neural network architectures, understanding training workflows, and optimizing models. We&amp;rsquo;ve even considered how powerful hardware like GPUs accelerate our tasks. But what happens when your model becomes so massive it won&amp;rsquo;t fit on a single GPU? Or when your dataset is so enormous that training takes weeks, even on the most powerful single machine?&lt;/p&gt;</description></item><item><title>Chapter 23: Project: Fine-Tuning an LLM for a Specific Task</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/project-llm-fine-tuning/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/project-llm-fine-tuning/</guid><description>&lt;h2 id="chapter-23-project-fine-tuning-an-llm-for-a-specific-task"&gt;Chapter 23: Project: Fine-Tuning an LLM for a Specific Task&lt;/h2&gt;
&lt;h3 id="introduction"&gt;Introduction&lt;/h3&gt;
&lt;p&gt;Welcome to an exciting hands-on chapter where we&amp;rsquo;ll dive deep into the practical art of fine-tuning Large Language Models (LLMs)! You&amp;rsquo;ve learned about the power of these models, their architectures, and how they process language. Now, it&amp;rsquo;s time to make them truly yours by adapting them to perform a specific task that their general pre-training might not have fully covered.&lt;/p&gt;</description></item><item><title>How AI Model Quantization Works: Deep Dive into Internals</title><link>https://ai-blog.noorshomelab.dev/how-it-works/ai-model-quantization/</link><pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/how-it-works/ai-model-quantization/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In the rapidly evolving world of artificial intelligence, the deployment of powerful neural networks into real-world applications often hits a bottleneck: their immense computational and memory requirements. AI model quantization is a critical optimization technique designed to address this challenge. It allows large, complex models—trained using high-precision floating-point numbers—to be compressed and executed efficiently on resource-constrained devices, from smartphones and IoT sensors to specialized AI accelerators.&lt;/p&gt;
&lt;p&gt;Understanding the internals of quantization is no longer a niche skill but a fundamental requirement for AI engineers and researchers aiming to build performant and deployable AI systems. It bridges the gap between theoretical model development and practical application, enabling faster inference times, reduced memory footprints, and lower power consumption.&lt;/p&gt;</description></item><item><title>Agentic AI Frameworks: Mastering LangChain/LangGraph for Smart Agents</title><link>https://ai-blog.noorshomelab.dev/ai/agentic-ai-frameworks/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/agentic-ai-frameworks/</guid><description>&lt;h1 id="agentic-ai-frameworks-mastering-langchainlanggraph-for-smart-agents"&gt;Agentic AI Frameworks: Mastering LangChain/LangGraph for Smart Agents&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-to-agentic-ai"&gt;1. Introduction to Agentic AI&lt;/h2&gt;
&lt;p&gt;The world of Artificial Intelligence is evolving at an unprecedented pace. We&amp;rsquo;re moving beyond simple chatbots and static question-answering systems towards intelligent entities that can think, plan, use tools, and even collaborate to achieve complex goals. This is the realm of &lt;strong&gt;Agentic AI&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="11-what-are-ai-agents"&gt;1.1. What are AI Agents?&lt;/h3&gt;
&lt;p&gt;Imagine a digital assistant that doesn&amp;rsquo;t just answer your questions but &lt;em&gt;understands&lt;/em&gt; your intent, &lt;em&gt;plans&lt;/em&gt; a series of steps to achieve it, &lt;em&gt;uses tools&lt;/em&gt; (like searching the web or interacting with an API) to gather information or perform actions, and &lt;em&gt;learns&lt;/em&gt; from its experiences. That&amp;rsquo;s an AI agent.&lt;/p&gt;</description></item><item><title>Decoding Large Language Models: A Deep Dive into LLM Architectures</title><link>https://ai-blog.noorshomelab.dev/ai/llm-architectures/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/llm-architectures/</guid><description>&lt;h1 id="decoding-large-language-models-a-deep-dive-into-llm-architectures"&gt;Decoding Large Language Models: A Deep Dive into LLM Architectures&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence, demonstrating unprecedented capabilities in understanding, generating, and manipulating human language. At their core, LLMs are complex neural networks, primarily built upon the Transformer architecture. This document serves as a comprehensive guide to LLM architectures, catering to both beginners and experienced professionals. We will journey from the foundational concepts of Transformer models to the intricate structural details of modern open-source LLMs, exploring their design choices and implications for development and optimization.&lt;/p&gt;</description></item><item><title>LLM Quantization: Making Models Lean for Local Deployment</title><link>https://ai-blog.noorshomelab.dev/ai/llm-quantization-mastery/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/llm-quantization-mastery/</guid><description>&lt;h1 id="llm-quantization-making-models-lean-for-local-deployment"&gt;LLM Quantization: Making Models Lean for Local Deployment&lt;/h1&gt;
&lt;h2 id="table-of-contents"&gt;Table of Contents&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="#introduction-the-need-for-lean-llms"&gt;Introduction: The Need for Lean LLMs&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-are-llms-and-why-are-they-so-large"&gt;What are LLMs and Why Are They So Large?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-challenge-of-local-deployment"&gt;The Challenge of Local Deployment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#enter-quantization-a-solution-for-resource-constrained-environments"&gt;Enter Quantization: A Solution for Resource-Constrained Environments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#understanding-the-basics-what-is-quantization"&gt;Understanding the Basics: What is Quantization?&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#floating-point-numbers-fp32-in-llms"&gt;Floating-Point Numbers (FP32) in LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-concept-of-reduced-precision"&gt;The Concept of Reduced Precision&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#analogy-from-high-definition-to-standard-definition"&gt;Analogy: From High-Definition to Standard-Definition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#benefits-of-quantization-size-speed-and-energy-efficiency"&gt;Benefits of Quantization: Size, Speed, and Energy Efficiency&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-trade-off-accuracy-vs-efficiency"&gt;The Trade-Off: Accuracy vs. Efficiency&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#quantization-techniques-a-deep-dive"&gt;Quantization Techniques: A Deep Dive&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#post-training-quantization-ptq-vs-quantization-aware-training-qat"&gt;Post-Training Quantization (PTQ) vs. Quantization-Aware Training (QAT)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#symmetric-vs-asymmetric-quantization"&gt;Symmetric vs. Asymmetric Quantization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#per-tensor-vs-per-channel-quantization"&gt;Per-Tensor vs. Per-Channel Quantization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#common-quantization-bit-widths"&gt;Common Quantization Bit-Widths&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#8-bit-quantization-int8"&gt;8-bit Quantization (INT8)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-bit-quantization-int4"&gt;4-bit Quantization (INT4)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#other-bit-widths-eg-2-bit-3-bit-5-bit"&gt;Other Bit-Widths (e.g., 2-bit, 3-bit, 5-bit)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#specific-quantization-algorithms-and-formats"&gt;Specific Quantization Algorithms and Formats&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#gptq-general-purpose-parameter-quantization"&gt;GPTQ (General-purpose Parameter Quantization)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#awq-activation-aware-weight-quantization"&gt;AWQ (Activation-aware Weight Quantization)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#gguf-gpt-generated-unified-format-a-key-for-llamacpp-and-ollama"&gt;GGUF (GPT-Generated Unified Format): A Key for &lt;code&gt;llama.cpp&lt;/code&gt; and Ollama&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#gguf-quantization-types-q2_k-q3_k-q4_k-q5_k-q6_k-q8_0"&gt;GGUF Quantization Types (Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q8_0)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#practical-implementation-quantizing-llms"&gt;Practical Implementation: Quantizing LLMs&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#using-bitsandbytes-for-quantization-aware-training-and-inference-pytorch"&gt;Using &lt;code&gt;bitsandbytes&lt;/code&gt; for Quantization-Aware Training and Inference (PyTorch)&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#installation"&gt;Installation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#loading-8-bit-models"&gt;Loading 8-bit Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#loading-4-bit-models-nf4"&gt;Loading 4-bit Models (NF4)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#integrating-with-hugging-face-transformers"&gt;Integrating with Hugging Face Transformers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fine-tuning-4-bit-models-qlora"&gt;Fine-tuning 4-bit Models (QLoRA)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#leveraging-llamacpp-and-gguf-for-cpu-friendly-inference"&gt;Leveraging &lt;code&gt;llama.cpp&lt;/code&gt; and GGUF for CPU-friendly Inference&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#introduction-to-llamacpp"&gt;Introduction to &lt;code&gt;llama.cpp&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#building-llamacpp"&gt;Building &lt;code&gt;llama.cpp&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#converting-models-to-gguf-format"&gt;Converting Models to GGUF Format&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quantizing-gguf-models-with-llamacpps-quantize-tool"&gt;Quantizing GGUF Models with &lt;code&gt;llama.cpp&lt;/code&gt;&amp;rsquo;s &lt;code&gt;quantize&lt;/code&gt; tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#running-gguf-models-with-llamacpp"&gt;Running GGUF Models with &lt;code&gt;llama.cpp&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#ollama-simplified-local-llm-deployment"&gt;Ollama: Simplified Local LLM Deployment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#how-ollama-utilizes-gguf"&gt;How Ollama Utilizes GGUF&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#downloading-and-running-quantized-models-with-ollama"&gt;Downloading and Running Quantized Models with Ollama&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#creating-custom-modelfiles-for-quantized-models"&gt;Creating Custom Modelfiles for Quantized Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#evaluating-quantization-trade-offs"&gt;Evaluating Quantization Trade-offs&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#model-size-reduction"&gt;Model Size Reduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#inference-speed-latency"&gt;Inference Speed (Latency)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#accuracy-metrics-and-evaluation"&gt;Accuracy Metrics and Evaluation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#perplexity"&gt;Perplexity&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#benchmark-tasks-eg-helm-mmlu"&gt;Benchmark Tasks (e.g., HELM, MMLU)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#qualitative-evaluation"&gt;Qualitative Evaluation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#hardware-considerations-cpu-vs-gpu"&gt;Hardware Considerations (CPU vs. GPU)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#choosing-the-right-quantization-scheme-for-your-use-case"&gt;Choosing the Right Quantization Scheme for Your Use Case&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#advanced-topics-and-future-directions"&gt;Advanced Topics and Future Directions&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#dynamic-vs-static-quantization"&gt;Dynamic vs. Static Quantization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#mixed-precision-training-and-inference"&gt;Mixed-Precision Training and Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#fine-grained-quantization-techniques"&gt;Fine-grained Quantization Techniques&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#emerging-quantization-research"&gt;Emerging Quantization Research&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="#conclusion"&gt;Conclusion&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#recap-of-key-concepts"&gt;Recap of Key Concepts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-future-of-lean-llms"&gt;The Future of Lean LLMs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#further-learning-resources"&gt;Further Learning Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-the-need-for-lean-llms"&gt;1. Introduction: The Need for Lean LLMs&lt;/h2&gt;
&lt;p&gt;The advent of Large Language Models (LLMs) has revolutionized various fields, from natural language processing to creative content generation. Models like GPT-3, LLaMA, Mistral, and many others have demonstrated unprecedented capabilities in understanding and generating human-like text. However, this power comes at a significant cost: immense model size and computational requirements.&lt;/p&gt;</description></item><item><title>Local LLM Deployment: Mastering Ollama for Custom Fine-tuned Models</title><link>https://ai-blog.noorshomelab.dev/ai/llm-deployment-serving/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/llm-deployment-serving/</guid><description>&lt;h1 id="llm-deployment-and-serving-local-mastering-ollama-for-custom-models"&gt;LLM Deployment and Serving (Local): Mastering Ollama for Custom Models&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-the-power-of-local-llms"&gt;1. Introduction: The Power of Local LLMs&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have ushered in a new era of intelligent applications, from advanced chatbots to sophisticated code assistants. While powerful, many LLMs are often accessed via cloud-based APIs, leading to concerns about data privacy, recurring costs, and internet dependency. This document champions the increasingly vital practice of deploying and serving LLMs locally. It offers a comprehensive guide to understanding, implementing, and optimizing local LLM inference, with a particular emphasis on &lt;strong&gt;Ollama&lt;/strong&gt;, an innovative framework that simplifies this complex process for both pre-packaged and custom fine-tuned models.&lt;/p&gt;</description></item><item><title>Mastering Deep Learning with PyTorch: From Tensors to Advanced Neural Networks for LLMs</title><link>https://ai-blog.noorshomelab.dev/ai/deep-learning-frameworks/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/deep-learning-frameworks/</guid><description>&lt;h1 id="mastering-deep-learning-with-pytorch-from-tensors-to-advanced-neural-networks-for-llms"&gt;Mastering Deep Learning with PyTorch: From Tensors to Advanced Neural Networks for LLMs&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-to-deep-learning-and-pytorch"&gt;1. Introduction to Deep Learning and PyTorch&lt;/h2&gt;
&lt;h3 id="what-is-deep-learning"&gt;What is Deep Learning?&lt;/h3&gt;
&lt;p&gt;Deep learning is a subfield of machine learning inspired by the structure and function of the human brain&amp;rsquo;s neural networks. Instead of explicit programming, deep learning models learn from vast amounts of data, automatically discovering intricate patterns and representations. These models are characterized by their &amp;ldquo;deep&amp;rdquo; architecture, consisting of multiple layers, which allows them to extract hierarchical features from raw data. From recognizing objects in images to understanding human language and generating creative content, deep learning has revolutionized numerous domains.&lt;/p&gt;</description></item><item><title>Mastering LLM Fine-tuning: Pre-training, SFT, and PEFT for Custom Models</title><link>https://ai-blog.noorshomelab.dev/ai/llm-fine-tuning/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/llm-fine-tuning/</guid><description>&lt;h1 id="llm-pre-training-and-fine-tuning-concepts"&gt;LLM Pre-training and Fine-tuning Concepts&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence, demonstrating remarkable capabilities in understanding, generating, and processing human language. These powerful models are at the heart of many cutting-edge applications, from sophisticated chatbots and content generators to complex code assistants. This document serves as a comprehensive guide to understanding the lifecycle of LLMs, from their initial pre-training to the crucial process of fine-tuning them for specific tasks and data.&lt;/p&gt;</description></item><item><title>NLP Fundamentals: Mastering Attention and Transformers for Large Language Models</title><link>https://ai-blog.noorshomelab.dev/ai/natural-language-processing-fundamentals/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/natural-language-processing-fundamentals/</guid><description>&lt;h1 id="natural-language-processing-fundamentals-from-text-preprocessing-to-transformers"&gt;Natural Language Processing Fundamentals: From Text Preprocessing to Transformers&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-to-natural-language-processing"&gt;1. Introduction to Natural Language Processing&lt;/h2&gt;
&lt;h3 id="what-is-nlp"&gt;What is NLP?&lt;/h3&gt;
&lt;p&gt;Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language. It&amp;rsquo;s the technology behind everyday applications like spam filters, virtual assistants (Siri, Alexa), machine translation (Google Translate), and sentiment analysis. NLP combines computational linguistics—rule-based modeling of human language—with AI, machine learning, and deep learning models to process vast amounts of text and speech data.&lt;/p&gt;</description></item><item><title>Retrieval-Augmented Generation (RAG): Enhancing LLMs with External Knowledge - A Practical Guide</title><link>https://ai-blog.noorshomelab.dev/ai/retrieval-augmented-generation/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/retrieval-augmented-generation/</guid><description>&lt;h1 id="retrieval-augmented-generation-rag-enhancing-llms-with-external-knowledge---a-practical-guide"&gt;Retrieval-Augmented Generation (RAG): Enhancing LLMs with External Knowledge - A Practical Guide&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="introduction-to-retrieval-augmented-generation-rag"&gt;Introduction to Retrieval-Augmented Generation (RAG)&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have revolutionized the way we interact with information, demonstrating remarkable abilities in generating human-like text, answering questions, and summarizing content. However, they come with inherent limitations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Hallucinations:&lt;/strong&gt; LLMs can sometimes generate factually incorrect or nonsensical information, presenting it confidently as truth. This is a significant hurdle in applications requiring high accuracy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lack of Up-to-Date Information:&lt;/strong&gt; The knowledge of LLMs is static, frozen at the time of their last training data cutoff. They cannot access real-time information or specific proprietary data sources.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited Context Window:&lt;/strong&gt; While LLMs have growing context windows, there&amp;rsquo;s still a limit to how much information they can process in a single prompt. For complex queries requiring extensive background, fitting all relevant data into the prompt becomes challenging.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Retrieval-Augmented Generation (RAG)&lt;/strong&gt; emerges as a powerful paradigm to address these limitations. RAG combines the generative power of LLMs with external, dynamic, and authoritative knowledge bases. Instead of relying solely on its internal, pre-trained knowledge, a RAG system first &lt;strong&gt;retrieves&lt;/strong&gt; relevant information from an external source and then uses this retrieved context to &lt;strong&gt;augment&lt;/strong&gt; the LLM&amp;rsquo;s response generation.&lt;/p&gt;</description></item></channel></rss>