<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>High Availability on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/high-availability/</link><description>Recent content in High Availability on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 20 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/high-availability/index.xml" rel="self" type="application/rss+xml"/><item><title>Scaling LLM Deployments: From Single Instances to Clusters</title><link>https://ai-blog.noorshomelab.dev/llmops-ai-infra-guide-2026/scaling-llm-deployments/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/llmops-ai-infra-guide-2026/scaling-llm-deployments/</guid><description>&lt;h2 id="scaling-llm-deployments-from-single-instances-to-clusters"&gt;Scaling LLM Deployments: From Single Instances to Clusters&lt;/h2&gt;
&lt;p&gt;Welcome back, MLOps engineers, data scientists, and developers! In previous chapters, we&amp;rsquo;ve explored the foundational elements of LLM inference pipelines, model routing, and critical optimization techniques like caching and GPU usage. You&amp;rsquo;ve likely started to appreciate the sheer resource demands of Large Language Models.&lt;/p&gt;
&lt;p&gt;Now, imagine your incredible LLM application goes viral overnight! Suddenly, a single GPU instance just won&amp;rsquo;t cut it. Requests flood in, latency skyrockets, and your users are unhappy. This is where the magic of &lt;strong&gt;scaling&lt;/strong&gt; comes into play.&lt;/p&gt;</description></item><item><title>Advanced Topics: High Availability and Clustering</title><link>https://ai-blog.noorshomelab.dev/redis-guide/high-availability-and-clustering/</link><pubDate>Fri, 07 Nov 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/redis-guide/high-availability-and-clustering/</guid><description>&lt;p&gt;In production environments, simply running a single Redis instance is often not enough. You need to ensure your Redis service is &lt;strong&gt;highly available&lt;/strong&gt; (it remains operational even if a server fails) and &lt;strong&gt;scalable&lt;/strong&gt; (it can handle increased load and data volume). Redis offers two primary solutions for these challenges: &lt;strong&gt;Redis Sentinel&lt;/strong&gt; for high availability and &lt;strong&gt;Redis Cluster&lt;/strong&gt; for horizontal scaling.&lt;/p&gt;
&lt;p&gt;This chapter will guide you through:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The concepts of High Availability (HA) and how Redis achieves it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Redis Sentinel&lt;/strong&gt;: For automatic failover and monitoring of master-replica setups.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Redis Cluster&lt;/strong&gt;: For sharding data across multiple nodes and providing both HA and linear scalability.&lt;/li&gt;
&lt;li&gt;Understanding the trade-offs and when to use each.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="1-high-availability-with-redis-sentinel"&gt;1. High Availability with Redis Sentinel&lt;/h3&gt;
&lt;p&gt;Redis Sentinel is a distributed system that provides high availability for Redis. It continuously monitors your Redis instances (masters and replicas), and if a master goes down, it automatically promotes a replica to become the new master. Sentinel also reconfigures the other replicas to follow the new master and informs client applications about the change.&lt;/p&gt;</description></item><item><title>Chapter 17: Deployment Strategies for High-Availability</title><link>https://ai-blog.noorshomelab.dev/usearch-scylladb-vector-search-guide-2026/17-deployment-strategies/</link><pubDate>Tue, 17 Feb 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/usearch-scylladb-vector-search-guide-2026/17-deployment-strategies/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome to Chapter 17! So far, we&amp;rsquo;ve journeyed from the basics of vector search to integrating USearch with ScyllaDB, tackling performance, and even debugging. Now, it&amp;rsquo;s time to elevate our game and ensure our vector search solution is not just fast and accurate, but also resilient and always available. In the world of real-time AI applications, downtime can be catastrophic, leading to lost revenue, frustrated users, and missed opportunities.&lt;/p&gt;</description></item><item><title>Chapter 18: Enterprise Best Practices &amp;amp; Design Principles</title><link>https://ai-blog.noorshomelab.dev/palo-alto-ngfw-mastery/enterprise-best-practices/</link><pubDate>Tue, 23 Dec 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/palo-alto-ngfw-mastery/enterprise-best-practices/</guid><description>&lt;h2 id="chapter-18-enterprise-best-practices--design-principles"&gt;Chapter 18: Enterprise Best Practices &amp;amp; Design Principles&lt;/h2&gt;
&lt;p&gt;Welcome back, future firewall master! In our journey so far, we&amp;rsquo;ve covered a tremendous amount, from the basic building blocks of Palo Alto Networks firewalls to advanced features like App-ID, User-ID, and SSL decryption. You&amp;rsquo;ve learned &lt;em&gt;how&lt;/em&gt; to configure these powerful tools. Now, it&amp;rsquo;s time to elevate your skills from just knowing &lt;em&gt;how&lt;/em&gt; to do things, to understanding &lt;em&gt;how to do them right&lt;/em&gt; in a real-world enterprise environment.&lt;/p&gt;</description></item></channel></rss>