<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LLM Optimization on AI VOID</title><link>https://ai-blog.noorshomelab.dev/categories/llm-optimization/</link><description>Recent content in LLM Optimization on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 30 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/categories/llm-optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>Dynamic Context: Prioritization &amp;amp; Sliding Windows for Agents</title><link>https://ai-blog.noorshomelab.dev/context-engineering-guide/dynamic-context-prioritization-sliding-windows/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/context-engineering-guide/dynamic-context-prioritization-sliding-windows/</guid><description>&lt;h2 id="introduction-to-dynamic-context"&gt;Introduction to Dynamic Context&lt;/h2&gt;
&lt;p&gt;Welcome back, fellow AI engineers! In our previous chapters, we laid the groundwork for effective context engineering. We learned how to design context, reduce its size through summarization and filtering, compress it for efficiency, and chunk it into manageable pieces. These foundational techniques are crucial, but they primarily deal with &lt;em&gt;static&lt;/em&gt; context – information that&amp;rsquo;s prepared once and then fed to the LLM.&lt;/p&gt;
&lt;p&gt;But what about long-running conversations, persistent agents, or applications that need to maintain a &amp;ldquo;memory&amp;rdquo; over extended periods? The fixed context window of LLMs, while growing, still presents a significant challenge. This is where &lt;strong&gt;dynamic context management&lt;/strong&gt; comes into play.&lt;/p&gt;</description></item><item><title>TurboQuant Unleashed: Google&amp;#39;s AI Compression Redefining LLM Efficiency</title><link>https://ai-blog.noorshomelab.dev/blog/google-turboquant-llm-compression-guide/</link><pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/blog/google-turboquant-llm-compression-guide/</guid><description>&lt;h2 id="turboquant-unleashed-googles-ai-compression-redefining-llm-efficiency"&gt;TurboQuant Unleashed: Google&amp;rsquo;s AI Compression Redefining LLM Efficiency&lt;/h2&gt;
&lt;p&gt;The world of Large Language Models (LLMs) is moving at an astonishing pace. From powering sophisticated chatbots to revolutionizing content creation, these models are at the forefront of AI innovation. However, their sheer size often translates into significant computational demands, especially when it comes to memory usage during inference. This memory hunger is a major bottleneck, driving up operational costs and limiting the practical deployment of truly massive models.&lt;/p&gt;</description></item></channel></rss>