<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gemma on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/gemma/</link><description>Recent content in Gemma on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/gemma/index.xml" rel="self" type="application/rss+xml"/><item><title>Run MTP LLMs with llama.cpp &amp;amp; vLLM</title><link>https://ai-blog.noorshomelab.dev/tutorials/run-mtp-llms-llama-cpp-vllm/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/tutorials/run-mtp-llms-llama-cpp-vllm/</guid><description>&lt;p&gt;&lt;strong&gt;What you&amp;rsquo;ll build:&lt;/strong&gt; By the end of this tutorial, you will be able to set up and run Multi-Token Prediction (MTP) capable LLMs locally using &lt;code&gt;llama.cpp&lt;/code&gt; and &lt;code&gt;vLLM&lt;/code&gt;, compare their performance against standard generation, and understand fallback options.
&lt;strong&gt;Time needed:&lt;/strong&gt; ~90 minutes
&lt;strong&gt;Prerequisites:&lt;/strong&gt; Basic command-line interface (CLI) familiarity, Git installed, C++ compiler (GCC/Clang for Linux/macOS, MSVC for Windows), CMake installed, Python 3.9+ installed, NVIDIA GPU with CUDA (11.8+ recommended) or AMD GPU with ROCm, or Apple Silicon (Metal), Sufficient RAM (16GB+ recommended) and VRAM (8GB+ recommended)
&lt;strong&gt;Version used:&lt;/strong&gt; llama.cpp: main branch (post MTP merge); vLLM: latest stable/developer preview with MTP support&lt;/p&gt;</description></item></channel></rss>