LAI #90: Research Agents, Model Selection, and Smarter Workflows

From fine-tuning GPT-OSS for multilingual reasoning to building human-in-the-loop systems and better ways to evaluate RAG.

Louis-François Bouchard

Towards AI

, and

Louie Peters

Aug 28, 2025

Good morning, AI enthusiasts! This week, we’re diving into how AI is evolving beyond quick answers. In What’s AI, we explore research agents: tools built to search, reason, and produce citation-backed reports in minutes. We’re also excited to share our latest O’Reilly Radar post on LLM system design and model selection, breaking down the trade-offs every AI engineer faces.

Beyond that, you’ll find hands-on work on teaching GPT-OSS multilingual reasoning, a look at the Model Context Protocol for scaling agent-tool interactions, a custom setup for AI-powered development in Cursor, and practical guides for human-in-the-loop workflows and evaluating RAG systems the right way.

Let’s get started!

What’s AI Weekly

This week in What’s AI, I look at a new wave of AI tools designed for deeper work: research agents. Unlike chatbots that give quick answers, these agents can search, reason, and pull together citation-backed reports in minutes. They’re built to handle multi-step queries and act more like junior researchers than assistants. Read the article to see how they work in practice, or watch the video for a quick overview.

— Louis-François Bouchard, Towards AI Co-founder & Head of Community

Learn AI Together Community Section!

Featured Post

We’re excited to share that our latest post, LLM System Design and Model Selection, is now live on O’Reilly Radar. This piece dives into the core decisions every AI engineer faces: how to select the right model, balance performance with cost, and design production-grade LLM systems that actually scale.

Whether you’re evaluating models for a new product or optimizing pipelines in production, this post gives you the practical criteria and mental models you need to make the right choices.

👉 Read the full article on O’Reilly

AI poll of the week!

OpenAI is leading this poll. That’s interesting, because if you ask “Which model produces the best images?”, MidJourney usually wins in community debates. So why is OpenAI ahead here? Likely because of integration and convenience.

It shows that in AI, distribution and accessibility can matter as much as raw quality. So, do you think the “best” image gen model will ultimately win on quality, or will the one that’s easiest to access dominate the future of creative workflows? Tell me in the thread!

Collaboration Opportunities

The Learn AI Together Discord community is flooded with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week!

1. Efficientnet_99825 is looking for someone to write a research paper, pick a dataset from Kaggle and work on it, build and finetune LLMs, and work on the stock market, quant analysis, and time series. If this sounds interesting, connect with him in the thread!

2. Silentsentinel6943 is looking for a partner to help him grow his GitHub repo, MARM-Systems. He needs help with scaling. If you can help, reach out in the thread!

Meme of the week!

Meme shared by sysop1984_26148

TAI Curated section

Article of the week

Teaching OpenAI’s GPT-OSS 20B Model Multilingual Reasoning Ability: A Hands-On Guide with RTX 4090 By Lorentz Yeung

The author outlines a process for fine-tuning OpenAI’s GPT-OSS 20B model to improve its multilingual reasoning abilities. Using a local RTX 4090 setup and the Unsloth library, the model was trained on the Multilingual-Thinking dataset. This training, accomplished in just 60 steps, shifted the model’s default English-based thinking to perform chain-of-thought analysis in other languages, like French. The outcome is a model capable of generating structured, language-specific reasoning in the Harmony format, showcasing an effective method for adapting large models for more diverse, global applications.

Our must-read articles

1. The Bridge to MCP: Scaling AI Tools with Gateways By Parth Saxena

As AI agents become more interconnected with various tools, managing their communication presents a significant scaling challenge. A discussion of the Model Context Protocol (MCP) shows how it offers a standardized solution for these interactions, enabling stateful dialogues superior to traditional APIs. To handle this complexity, MCP gateways serve as a centralized entry point for authentication, security, and logging. The piece highlights recent open-source projects, like the author’s Bridge MCP, which are building the foundational infrastructure to scale these systems and shape the future of agent-tool connectivity.

2. My Cursor Custom Mode Setup: Building the Perfect AI Development Toolkit By Mayank Bohra

To enhance AI-assisted development, the author outlines a method for creating a specialized toolkit within the Cursor code editor. This approach extends beyond generic AI chat by establishing custom modes that pair specific models, such as GPT-5, Claude 4, and Gemini 2.5, with fine-tuned system prompts tailored to particular tasks. The piece details seven distinct “expert” assistants, including a Code Architect for system design, a Bug Hunter for complex debugging, and a Performance Optimizer for algorithmic improvements. This strategy is designed to leverage the unique strengths of each AI to handle specialized cognitive workloads and improve workflow efficiency.

3. Human in the loop AI Workflows using Langgraph By Aayushi_Sharma

This article details the implementation of Human-in-the-Loop (HITL) workflows in AI using LangGraph. It explains how HITL provides essential control over autonomous agents by allowing users to pause execution, approve or reject actions, and modify the agent’s state in real time. This capability is presented as a method for building safer and more reliable AI systems. The piece includes a step-by-step guide for creating an agent that halts before executing a tool, requiring human confirmation to continue. This demonstrates how to effectively combine machine efficiency with necessary human judgment in complex AI workflows.

4. From Prompts to RAG to RAGAs: Evaluating Retrieval-Augmented Generation Systems the Right Way By Edgar Bermudez

This piece addresses the limitations of Retrieval-Augmented Generation (RAG) systems, which often fail despite impressive demos. It introduces RAGAs, a framework designed to systematically evaluate these systems. The author explains how RAGAs provides measurable metrics — such as context precision, faithfulness, and answer correctness — to identify weaknesses in retrieval and generation. A practical code example demonstrates how to implement a RAG pipeline and apply RAGAs for evaluation. The text also outlines best practices for creating robust evaluation datasets, offering a structured approach for developing reliable, production-ready RAG applications, rather than relying on subjective assessments.

If you want to publish with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.