<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Transformers on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/transformers/</link><description>Recent content in Transformers on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 20 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/transformers/index.xml" rel="self" type="application/rss+xml"/><item><title>Representing Reality: From Raw Data to Embeddings</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/representing-reality-raw-data-to-embeddings/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/representing-reality-raw-data-to-embeddings/</guid><description>&lt;p&gt;Welcome back, future multimodal AI maestros! In our previous chapter, we explored the exciting world of multimodal AI and its incredible potential. Now, it&amp;rsquo;s time to dive deeper and understand the fundamental step that makes all this magic possible: transforming the messy, diverse &amp;ldquo;real world&amp;rdquo; data into a language our AI models can understand.&lt;/p&gt;
&lt;p&gt;This chapter is all about &lt;strong&gt;representing reality&lt;/strong&gt;. We&amp;rsquo;ll learn how raw inputs like text, images, audio, and video, which seem so different to us, are converted into a common, numerical format called &lt;strong&gt;embeddings&lt;/strong&gt;. Think of it as teaching your AI system to &amp;ldquo;see,&amp;rdquo; &amp;ldquo;hear,&amp;rdquo; and &amp;ldquo;read&amp;rdquo; by giving it a universal dictionary of meaning. Mastering this concept is crucial, as it forms the bedrock for any multimodal system you&amp;rsquo;ll ever build.&lt;/p&gt;</description></item><item><title>Architecting Multimodal Encoders: Giving AI &amp;#39;Senses&amp;#39;</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/architecting-multimodal-encoders/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/architecting-multimodal-encoders/</guid><description>&lt;h2 id="introduction-giving-ai-senses"&gt;Introduction: Giving AI &amp;lsquo;Senses&amp;rsquo;&lt;/h2&gt;
&lt;p&gt;Welcome back, future multimodal AI architects! In our previous chapter, we explored the fascinating world of multimodal AI, understanding why combining different types of data (modalities) leads to more robust and intelligent systems. Now, it&amp;rsquo;s time to dive into &lt;em&gt;how&lt;/em&gt; AI actually &amp;ldquo;sees,&amp;rdquo; &amp;ldquo;hears,&amp;rdquo; and &amp;ldquo;reads&amp;rdquo; the world.&lt;/p&gt;
&lt;p&gt;This chapter is all about &lt;strong&gt;multimodal encoders&lt;/strong&gt; – the specialized neural networks that act as the sensory organs of our AI. Just as our brains have distinct areas for processing sight, sound, and language, multimodal AI systems use different encoders to transform raw, messy data like pixels, audio waveforms, or text characters into a common, understandable language for the AI. You&amp;rsquo;ll learn the fundamental architectural patterns that enable AI to perceive and represent diverse inputs, paving the way for truly intelligent systems.&lt;/p&gt;</description></item><item><title>Multimodal LLMs: The Brains of Modern Multimodal AI</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/multimodal-llms-modern-ai/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/multimodal-llms-modern-ai/</guid><description>&lt;h2 id="multimodal-llms-the-brains-of-modern-multimodal-ai"&gt;Multimodal LLMs: The Brains of Modern Multimodal AI&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architects! In previous chapters, we laid the groundwork by understanding how to ingest and represent different types of data—text, images, audio, and video—as numerical embeddings. We learned that the secret to multimodal AI lies in transforming these diverse inputs into a common language that machines can understand. Now, it&amp;rsquo;s time to introduce the superstar that stitches all these pieces together and makes true cross-modal reasoning possible: &lt;strong&gt;Multimodal Large Language Models (MLLMs)&lt;/strong&gt;.&lt;/p&gt;</description></item><item><title>Hands-On Project: Building a Multimodal Search Assistant</title><link>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/hands-on-multimodal-search-assistant/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/multimodal-ai-guide-2026/hands-on-multimodal-search-assistant/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Welcome to an exciting hands-on chapter! In our previous discussions, we&amp;rsquo;ve explored the core concepts of multimodal AI, delving into how different data types—text, images, audio, and video—can be processed and integrated. We&amp;rsquo;ve talked about representation learning, data fusion, and the importance of shared embedding spaces. Now, it&amp;rsquo;s time to put that knowledge into action!&lt;/p&gt;
&lt;p&gt;In this chapter, we&amp;rsquo;ll embark on a practical project: building a simple yet powerful &lt;strong&gt;Multimodal Search Assistant&lt;/strong&gt;. Imagine having a personal knowledge base where you can search for information not just by text, but also by what an image looks like, or even a combination of both. This assistant will allow us to index both text documents and images, and then query them using natural language. We&amp;rsquo;ll leverage state-of-the-art pre-trained models to create a shared understanding across modalities, making our search truly multimodal.&lt;/p&gt;</description></item><item><title>Chapter 12: Multimodal Models: Vision-Language Integration</title><link>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/multimodal-models/</link><pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai-ml-career-path-2026/multimodal-models/</guid><description>&lt;h2 id="chapter-12-multimodal-models-vision-language-integration"&gt;Chapter 12: Multimodal Models: Vision-Language Integration&lt;/h2&gt;
&lt;p&gt;Welcome back, future AI architect! In our journey so far, we&amp;rsquo;ve explored the depths of neural networks, mastered the art of training deep learning models, and even fine-tuned powerful Large Language Models (LLMs). Each step has brought us closer to building truly intelligent systems. But what if we want our AI to do more than just understand text or analyze images in isolation? What if we want it to &lt;em&gt;see&lt;/em&gt; and &lt;em&gt;understand&lt;/em&gt; the world, like humans do, by combining different senses?&lt;/p&gt;</description></item><item><title>Decoding Large Language Models: A Deep Dive into LLM Architectures</title><link>https://ai-blog.noorshomelab.dev/ai/llm-architectures/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/llm-architectures/</guid><description>&lt;h1 id="decoding-large-language-models-a-deep-dive-into-llm-architectures"&gt;Decoding Large Language Models: A Deep Dive into LLM Architectures&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence, demonstrating unprecedented capabilities in understanding, generating, and manipulating human language. At their core, LLMs are complex neural networks, primarily built upon the Transformer architecture. This document serves as a comprehensive guide to LLM architectures, catering to both beginners and experienced professionals. We will journey from the foundational concepts of Transformer models to the intricate structural details of modern open-source LLMs, exploring their design choices and implications for development and optimization.&lt;/p&gt;</description></item><item><title>NLP Fundamentals: Mastering Attention and Transformers for Large Language Models</title><link>https://ai-blog.noorshomelab.dev/ai/natural-language-processing-fundamentals/</link><pubDate>Fri, 22 Aug 2025 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/ai/natural-language-processing-fundamentals/</guid><description>&lt;h1 id="natural-language-processing-fundamentals-from-text-preprocessing-to-transformers"&gt;Natural Language Processing Fundamentals: From Text Preprocessing to Transformers&lt;/h1&gt;
&lt;hr&gt;
&lt;h2 id="1-introduction-to-natural-language-processing"&gt;1. Introduction to Natural Language Processing&lt;/h2&gt;
&lt;h3 id="what-is-nlp"&gt;What is NLP?&lt;/h3&gt;
&lt;p&gt;Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language. It&amp;rsquo;s the technology behind everyday applications like spam filters, virtual assistants (Siri, Alexa), machine translation (Google Translate), and sentiment analysis. NLP combines computational linguistics—rule-based modeling of human language—with AI, machine learning, and deep learning models to process vast amounts of text and speech data.&lt;/p&gt;</description></item></channel></rss>