<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Document Processing on AI VOID</title><link>https://ai-blog.noorshomelab.dev/tags/document-processing/</link><description>Recent content in Document Processing on AI VOID</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 05 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ai-blog.noorshomelab.dev/tags/document-processing/index.xml" rel="self" type="application/rss+xml"/><item><title>Chapter 9: Tackling Long Documents with Chunking Strategies</title><link>https://ai-blog.noorshomelab.dev/langextract-guide-2026/09-chunking-strategies/</link><pubDate>Mon, 05 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/langextract-guide-2026/09-chunking-strategies/</guid><description>&lt;h2 id="chapter-9-tackling-long-documents-with-chunking-strategies"&gt;Chapter 9: Tackling Long Documents with Chunking Strategies&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid data explorer! So far, we&amp;rsquo;ve learned how to set up LangExtract, define schemas, and extract structured information from various texts. But what happens when your text isn&amp;rsquo;t a neat paragraph or a short email, but an entire legal contract, a research paper, or a lengthy financial report? These documents often exceed the &amp;ldquo;attention span&amp;rdquo; of even the most powerful Large Language Models (LLMs).&lt;/p&gt;</description></item><item><title>Chapter 10: Multi-Pass Extraction and Refinement</title><link>https://ai-blog.noorshomelab.dev/langextract-guide-2026/10-multi-pass-extraction/</link><pubDate>Mon, 05 Jan 2026 00:00:00 +0000</pubDate><guid>https://ai-blog.noorshomelab.dev/langextract-guide-2026/10-multi-pass-extraction/</guid><description>&lt;h2 id="introduction-beyond-single-pass-extraction"&gt;Introduction: Beyond Single-Pass Extraction&lt;/h2&gt;
&lt;p&gt;Welcome back, intrepid data explorer! In our previous chapters, we&amp;rsquo;ve mastered the fundamentals of LangExtract, from setting up your environment to crafting effective schemas for single-pass information extraction. You&amp;rsquo;ve seen how powerful LLMs can be when guided by a clear structure.&lt;/p&gt;
&lt;p&gt;However, the real world often throws us curveballs—or, in this case, extremely long and complex documents like financial reports, legal contracts, or research papers. These documents pose a significant challenge for Large Language Models (LLMs) due to their inherent &amp;ldquo;context window&amp;rdquo; limitations. An LLM can only process a finite amount of text at one time. What happens when your document is much longer than that window? And what if the information you need is scattered across hundreds of pages, requiring synthesis and cross-referencing?&lt;/p&gt;</description></item></channel></rss>