<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>MLLMs on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/mllms/</link><description>Recent content in MLLMs on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 20 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/mllms/index.xml" rel="self" type="application/rss+xml"/><item><title>Multimodal LLMs: The Brains of Modern Multimodal AI</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/multimodal-llms-modern-ai/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/multimodal-llms-modern-ai/</guid><description>&lt;h2 id="multimodal-llms-the-brains-of-modern-multimodal-ai"&gt;Multimodal LLMs: The Brains of Modern Multimodal AI&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architects! In previous chapters, we laid the groundwork by understanding how to ingest and represent different types of data—text, images, audio, and video—as numerical embeddings. We learned that the secret to multimodal AI lies in transforming these diverse inputs into a common language that machines can understand. Now, it&amp;rsquo;s time to introduce the superstar that stitches all these pieces together and makes true cross-modal reasoning possible: &lt;strong&gt;Multimodal Large Language Models (MLLMs)&lt;/strong&gt;.&lt;/p&gt;</description></item><item><title>Generative Multimodal AI: Creating and Innovating</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/generative-multimodal-ai-creating-innovating/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/generative-multimodal-ai-creating-innovating/</guid><description>&lt;h2 id="introduction-to-generative-multimodal-ai"&gt;Introduction to Generative Multimodal AI&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid AI explorers! In previous chapters, we&amp;rsquo;ve delved into how multimodal AI systems &lt;em&gt;understand&lt;/em&gt; and &lt;em&gt;interpret&lt;/em&gt; information from diverse sources like text, images, audio, and video. We learned about sophisticated techniques for integrating these inputs, creating rich, unified representations, and enabling AI to make sense of a complex world.&lt;/p&gt;
&lt;p&gt;Now, we&amp;rsquo;re going to flip the script! Instead of just understanding, what if our AI could &lt;em&gt;create&lt;/em&gt;? This chapter is all about &lt;strong&gt;Generative Multimodal AI&lt;/strong&gt; – systems capable of producing novel content that spans multiple modalities. Imagine an AI that can take a text description and generate a matching image, or an audio prompt and produce a piece of music with accompanying visuals. This isn&amp;rsquo;t science fiction; it&amp;rsquo;s the cutting edge of AI, rapidly evolving with powerful models like Google&amp;rsquo;s Gemini 1.5 and OpenAI&amp;rsquo;s GPT-4o.&lt;/p&gt;</description></item></channel></rss>