LAI #79: How LLMs Learn, Vertical Model Growth, and Smarter Evaluation

Foundational concepts, $100B shifts in AI strategy, and a new primer for building real-world LLM systems.

Louis-François Bouchard

Towards AI

, and

Louie Peters

Jun 12, 2025

Good morning, AI enthusiasts,

This week’s issue is about getting back to first principles. We’re diving into how LLMs actually learn: what’s under the hood, and why it matters when you’re building or deploying anything serious. It’s also the perfect lead-in to the launch of our 10-Hour LLM Primer, designed to cut through the fog and help you make real technical decisions, fast.

In the curated section, you’ll find a breakdown of how Segment Any Text improves RAG structure, a primer on MCP and how it powers secure automation, and a look at why vertical LLMs are outpacing general-purpose models by 10x. You also have a tutorial on agent tracing with Langfuse, classification metrics, and a community-built Llama router.

And this week again, we have an amazing guest post on Miguel Otero Pedrido’s Neural Maze newsletter comparing Retrieval-Augmented Generation (RAG) and Cache-Augmented Generation (CAG). The findings were clear: CAG is far more efficient post-cache processing nearly 10× fewer tokens per query, and more disciplined in refusing to answer when information is out-of-scope. RAG, however, produced more detailed in-scope answers in direct comparisons. Cost-wise, CAG breaks even fast; in our tests, just six queries on a small cache made it cheaper than RAG. The full piece includes evaluation methods, examples, and cost breakdowns — you can read it here!

We’d love to hear your thoughts if you do check it out. Drop a quick comment if you found it helpful or if there’s more you’d like us to explore on these systems.

What’s AI Weekly

The lack of a fundamental understanding of how LLMs are made leads to people both underestimating AI capability in some areas (and therefore not pushing AI usage to its potential) and overestimating AI capability in other areas (which can lead to user frustration and premature rejection of these tools, or worse to AI mistakes that slip into work outputs). So this week in What’s AI, I am diving into how these models are designed and built. While a thorough understanding of the fundamental design of LLMs is no longer always necessary to utilize and build LLM products effectively, knowing some theoretical concepts, like LLM’s training objective, training data, how they generate words, and understanding embeddings, can make a big difference in getting the most out of them. So, read the complete article here or watch the video on YouTube.

— Louis-François Bouchard, Towards AI Co-founder & Head of Community

Introducing the 10-Hour Video Primer: Building and Operating LLMs

If you’ve ever gone five blog posts deep and still ended up with “it depends,” this course is for you.

The 10-Hour LLM Primer helps you cut through the noise and make real technical decisions — fast.

In five focused video sessions, you’ll learn:

What transformers actually do (and don’t)
When to use prompting, RAG, fine-tuning, or none of them
How real-world agent workflows are structured
What breaks at scale — and how to fix it
How to evaluate and maintain production-grade LLM systems

It’s bingeable in a day. Or treat it like an AI docuseries.

No fluff. No hype. Just the clarity we usually deliver in $25K+ enterprise workshops.

Launch price: $199, includes all future updates (dropping soon)

Price will go up once new lessons drop — grab it now.

👉 Course details here

Learn AI Together Community Section!

Featured Community post from the Discord

Jasonfpv_ has built a FlexLlama repo, a lightweight, extensible, and user-friendly self-hosted tool that efficiently runs multiple llama.cpp server instances with OpenAI v1 API compatibility. It’s designed to manage multiple models across different GPUs, making it a powerful solution for local AI development and deployment. Check it out on GitHub and support a fellow community member. If you have any questions or feedback, share them in the thread!

AI poll of the week!

70% of you voted for hybrid architectures, and that says a lot. We’re moving past the “bigger is better” phase of AI. Instead of one model to rule them all, builders are betting on coordination over domination: a strong generalist guiding a swarm of specialists. It’s more efficient, more modular, and maybe… more human.

What’s interesting is that very few chose “one giant model.” That’s a shift. The market may still reward massive models today, but the real-world is clearly thinking in systems, not silos.

What’s a real use case where you think the hybrid model (big generalist + small experts) would clearly outperform any single model alone? Share your thoughts in the thread!

Collaboration Opportunities

The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week!

1. Safar4352 is building a project from scratch and needs a project partner. The project will be Python-based, so if you want to understand GenAI and LLMs in depth, connect with him in the thread!

2. Tranquil_dolphin_27432 wants to collaborate with someone who typically sees AI as their passion project. If this sounds like you, reach out in the thread!

3. Zabka___ is looking for a small community of 10–30 high schoolers, students, or researchers to learn machine learning together. Their focus is on theoretical aspects, such as linear algebra and statistics. If you want to expand your knowledge in this space, contact them in the thread!

Meme of the week!

Meme shared by ghost_in_the_machine

TAI Curated section

Article of the week

LangChain + Segment Any Text + RAG = The Key to Understanding Your Documents By Gao Dalie (高達烈)

This article addresses the common RAG system issue of semantic fragmentation from standard text splitting. It proposes using a Segment Any Text (SAT) approach, implemented via the ContextGem framework, to ensure text chunks are semantically whole before processing. The author demonstrates how to build a powerful agent that combines ContextGem’s structured data extraction with a traditional LangChain RAG pipeline using FAISS and OpenAI. This dual approach allows for more accurate and context-aware responses by querying both structured insights and unstructured document text.

Our must-read articles

1. MCP -The Golden Key for AI Automation By Alex Punnen

This blog explains the Model Context Protocol (MCP), a standard for letting LLMs interact with external APIs. It breaks down how MCP works by allowing an LLM to generate a JSON-RPC call for a specific tool after receiving its description. The author uses a simple calculator example to demonstrate the process. A significant portion covers MCP’s secure, OAuth2-based authorization, using the Zerodha Kite financial platform to illustrate how an authenticated session can be established without the client ever handling sensitive credentials.

2. $100B and Rising: Why Vertical LLMs Are Growing 10x Faster Than General AI Models By R. Thompson (PhD)

This article details the growing industry shift from general-purpose AI to vertical LLMs, which are tailored for specific sectors like finance, healthcare, and law. It highlights how general models can fall short in nuanced, regulated environments, whereas specialized models like BloombergGPT and Med-PaLM 2 provide superior accuracy, compliance, and contextual depth. The piece also covers the technical methods, such as RAG and QLoRA, that facilitate the development of these domain-specific models, which are projected to see significant market growth.

3. Monitor and Evaluate Open AI SDK Agents using Langfuse By Steve George

The author presents a method for monitoring an agentic workflow built with the OpenAI SDK. It outlines a three-agent system: an input guardrail to block sensitive data, an assist agent to generate responses, and a validation agent for fact-checking. The piece demonstrates how to use OpenTelemetry to send trace data to Langfuse for visualization. It also covers programmatically accessing traces with the Langfuse SDK to create analytical plots for evaluation.

4. The Essential Guide to Model Evaluation Metrics for Classification By Ayo Akinkugbe

This piece provides a comprehensive overview of evaluation metrics for classification models, starting with the foundational confusion matrix. It details common metrics like accuracy, precision, and recall, explaining their ideal use cases and limitations, particularly with imbalanced data. The guide also covers more advanced measures such as ROC AUC, Log Loss, and the Phi Coefficient, using practical case studies to illustrate their application. It stresses the importance of selecting metrics based on specific project needs and business goals.

If you want to publish with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.