<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement Learning on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/reinforcement-learning/</link><description>Recent content in Reinforcement Learning on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 10 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Introduction to Agentic Lightening</title><link>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/introduction-to-agentic-lightening/</link><pubDate>Thu, 06 Nov 2025 22:00:00 +0530</pubDate><guid>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/introduction-to-agentic-lightening/</guid><description>&lt;h2 id="introduction-to-agentic-lightening"&gt;Introduction to Agentic Lightening&lt;/h2&gt;
&lt;p&gt;Welcome to the exciting world of Agentic Lightening! This chapter will introduce you to this powerful framework, explain why it&amp;rsquo;s a crucial tool for modern AI development, and give you a brief overview of its origins.&lt;/p&gt;
&lt;h3 id="what-is-agentic-lightening"&gt;What is Agentic Lightening?&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Agentic Lightening&lt;/strong&gt; is an open-source framework developed by Microsoft, designed to empower developers to &lt;strong&gt;train and optimize any AI agent&lt;/strong&gt; with remarkable ease. In the rapidly evolving landscape of AI, agents are becoming increasingly sophisticated, performing complex, multi-step tasks autonomously. However, making these agents perform optimally, especially in real-world, dynamic scenarios, can be incredibly challenging. This is where Agentic Lightening steps in.&lt;/p&gt;</description></item><item><title>Advanced Optimization Algorithms</title><link>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/advanced-optimization-algorithms/</link><pubDate>Thu, 06 Nov 2025 22:00:00 +0530</pubDate><guid>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/advanced-optimization-algorithms/</guid><description>&lt;h2 id="advanced-optimization-algorithms"&gt;Advanced Optimization Algorithms&lt;/h2&gt;
&lt;p&gt;With a solid understanding of rollouts and rewards, we can now delve into the powerful optimization algorithms that Agentic Lightening integrates to make your AI agents truly adaptive and performant. Agentic Lightening is designed to be algorithm-agnostic, providing hooks for various techniques. While its initial strong focus is on Reinforcement Learning (RL), it also supports Automatic Prompt Optimization (APO) and can facilitate Supervised Fine-tuning (SFT).&lt;/p&gt;
&lt;p&gt;This chapter will provide an overview of these algorithms, explain their relevance in the context of agent training, and show how they conceptually fit into the Agentic Lightening framework.&lt;/p&gt;</description></item><item><title>Chapter 7: Introduction to Reinforcement Learning from Human Feedback (RLHF) Concepts</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/07-rlhf-concepts/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/07-rlhf-concepts/</guid><description>&lt;h2 id="introduction-to-reinforcement-learning-from-human-feedback-rlhf-concepts"&gt;Introduction to Reinforcement Learning from Human Feedback (RLHF) Concepts&lt;/h2&gt;
&lt;p&gt;Welcome to Chapter 7! So far, we&amp;rsquo;ve explored the foundational aspects of Tunix, understanding how it leverages JAX to efficiently manage and fine-tune Large Language Models (LLMs). We&amp;rsquo;ve touched upon pre-training and various forms of supervised fine-tuning. But what happens when you want your LLM to not just generate coherent text, but to also be &lt;em&gt;helpful&lt;/em&gt;, &lt;em&gt;harmless&lt;/em&gt;, and &lt;em&gt;honest&lt;/em&gt;—to truly align with human values and instructions? That&amp;rsquo;s where Reinforcement Learning from Human Feedback, or RLHF, steps in.&lt;/p&gt;</description></item><item><title>Chapter 8: Implementing Basic RLHF Workflows with Tunix</title><link>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/08-basic-rlhf-implementation/</link><pubDate>Fri, 30 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tunix-mastery-2026/08-basic-rlhf-implementation/</guid><description>&lt;h2 id="chapter-8-implementing-basic-rlhf-workflows-with-tunix"&gt;Chapter 8: Implementing Basic RLHF Workflows with Tunix&lt;/h2&gt;
&lt;p&gt;Welcome back, future LLM maestro! In our journey through Tunix, we&amp;rsquo;ve explored its architecture, set up our environment, and even fine-tuned models with supervised learning. But what if we want our Language Models (LLMs) to not just predict the next word, but to genuinely understand and align with human preferences? This is where Reinforcement Learning from Human Feedback (RLHF) shines, and Tunix provides the robust, JAX-native tooling to make it happen.&lt;/p&gt;</description></item><item><title>Project 2: Enhancing a LangChain Agent with Reinforcement Learning</title><link>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/project-enhancing-langchain-agent-with-rl/</link><pubDate>Thu, 06 Nov 2025 22:00:00 +0530</pubDate><guid>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/project-enhancing-langchain-agent-with-rl/</guid><description>&lt;h2 id="project-2-enhancing-a-langchain-agent-with-reinforcement-learning"&gt;Project 2: Enhancing a LangChain Agent with Reinforcement Learning&lt;/h2&gt;
&lt;p&gt;This project delves into a more advanced scenario: taking an existing agent built with a popular framework (LangChain) and enhancing its performance using &lt;strong&gt;Reinforcement Learning (RL)&lt;/strong&gt; via Agentic Lightening. Instead of just tuning prompts, we&amp;rsquo;ll focus on optimizing the agent&amp;rsquo;s decision-making and tool-use strategy in a simulated interactive environment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Clear Objective:&lt;/strong&gt; To integrate a LangChain agent into Agentic Lightening and conceptually train it with RL to improve its ability to solve multi-step problems requiring tool usage.&lt;/p&gt;</description></item><item><title>Bonus Section: Further Learning and Resources</title><link>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/further-learning-and-resources/</link><pubDate>Thu, 06 Nov 2025 22:00:00 +0530</pubDate><guid>https://ai-blog.noorshomelab.dev/agentic-lightening-guide/further-learning-and-resources/</guid><description>&lt;h2 id="bonus-section-further-learning-and-resources"&gt;Bonus Section: Further Learning and Resources&lt;/h2&gt;
&lt;p&gt;Congratulations on completing this comprehensive guide to Agentic Lightening! You&amp;rsquo;ve come a long way, from understanding the foundational concepts to building and optimizing agents with practical projects. The field of AI agents and their optimization is rapidly evolving, so continuous learning is key.&lt;/p&gt;
&lt;p&gt;This section provides a curated list of resources to help you deepen your knowledge, stay updated with the latest advancements, and connect with the wider AI community.&lt;/p&gt;</description></item><item><title>RAGEN-2: Reasoning Collapse in Agentic RL: Research Explainer for Builders</title><link>https://ai-blog.noorshomelab.dev/research/ragen-2-reasoning-collapse-agentic-rl/</link><pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/research/ragen-2-reasoning-collapse-agentic-rl/</guid><description>&lt;h2 id="quick-verdict-your-llm-agent-might-be-falling-apart-internally"&gt;Quick Verdict: Your LLM Agent Might Be Falling Apart Internally&lt;/h2&gt;
&lt;p&gt;Imagine your LLM agent successfully navigates the first few steps of a complex task. It generates sensible thoughts, takes appropriate actions, and makes progress. But beneath the surface, its internal reasoning process could be silently degrading, becoming erratic, repetitive, or nonsensical. This is &amp;ldquo;reasoning collapse,&amp;rdquo; and it&amp;rsquo;s a critical, often undetected, problem in multi-turn LLM agents, especially those trained with Reinforcement Learning (RL).&lt;/p&gt;</description></item><item><title>Learn Agentic Lightening 0.2.1: The Absolute Trainer to Light Up AI Agents</title><link>https://ai-blog.noorshomelab.dev/guides/learn-agentic-lightening-0-2-1/</link><pubDate>Thu, 06 Nov 2025 22:00:00 +0530</pubDate><guid>https://ai-blog.noorshomelab.dev/guides/learn-agentic-lightening-0-2-1/</guid><description>&lt;p&gt;This learning guide provides a comprehensive introduction to &lt;strong&gt;Agentic Lightening&lt;/strong&gt;, Microsoft&amp;rsquo;s innovative open-source framework for training and optimizing AI agents. Whether you&amp;rsquo;re a complete beginner eager to dive into the world of agentic AI or an experienced developer looking to integrate advanced optimization techniques into your existing agent frameworks (like LangChain or AutoGen), this document will equip you with the knowledge and practical skills you need. We&amp;rsquo;ll start from the very basics, guiding you through setting up your environment, understanding core concepts, and progressively moving towards advanced topics and real-world projects. Each section includes detailed explanations, hands-on code examples, and challenging exercises to ensure you learn by doing.&lt;/p&gt;</description></item></channel></rss>