<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Speech-to-Text on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/speech-to-text/</link><description>Recent content in Speech-to-Text on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 06 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/speech-to-text/index.xml" rel="self" type="application/rss+xml"/><item><title>Implementing On-Device Speech-to-Text with Whisper.cpp</title><link>https://ai-blog.noorshomelab.dev/on-device-ai-agents-tiny-llms-guide-2026/on-device-stt-whisper-cpp/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/on-device-ai-agents-tiny-llms-guide-2026/on-device-stt-whisper-cpp/</guid><description>&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Building truly intelligent on-device AI agents starts with their ability to perceive and understand the world around them. For human interaction, this often means processing spoken language directly on the device. In this chapter, we&amp;rsquo;ll lay the groundwork for our edge AI system by implementing robust, low-latency Speech-to-Text (STT) capabilities.&lt;/p&gt;
&lt;p&gt;We will leverage &lt;code&gt;whisper.cpp&lt;/code&gt;, a high-performance C++ port of OpenAI&amp;rsquo;s Whisper model, to perform transcription entirely on the device. This choice is critical for privacy, reducing reliance on cloud services, and achieving minimal latency—all hallmarks of a production-ready edge AI system. By the end of this chapter, you will have a standalone command-line application that can transcribe audio files with impressive accuracy, forming a core component for any voice-enabled agent.&lt;/p&gt;</description></item><item><title>Building the Agentic Core: STT to LLM to Intent Mapping</title><link>https://ai-blog.noorshomelab.dev/on-device-ai-agents-tiny-llms-guide-2026/agentic-core-intent-mapping/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/on-device-ai-agents-tiny-llms-guide-2026/agentic-core-intent-mapping/</guid><description>&lt;p&gt;In this chapter, we&amp;rsquo;re building the brain of our on-device AI agent: the core pipeline that translates user speech into actionable intents. This involves taking transcribed text, feeding it into a tiny, local Large Language Model (LLM), and then extracting a structured understanding of what the user wants to do. This is a critical step towards enabling truly intelligent, privacy-preserving interactions on edge devices.&lt;/p&gt;
&lt;p&gt;By the end of this milestone, you will have a functional Python script that can:&lt;/p&gt;</description></item></channel></rss>