Venture Building with AI Agents
Venture Building with AI Agents
Words Jack Howell & Jacob George
November 11th 2025 / 8 min read
“AI Agents” felt like the buzzword of the year as we went into 2025. The rapid evolution of LLMs and AI tooling meant that teams were now starting to build and deploy AI agents to take over repetitive human tasks. Those tasks ranged from completing forms, to triaging inboxes and even offering fitness and nutrition coaching. And the likes of Anthropic and OpenAI were introducing frameworks and best practices for building agents. This was all a step on from AI workflows, which we saw take over in 2024. For the first time, we could get AI to achieve outcomes rather than purely output.
Growing pains in 2025
However, looking back over the last 12 months I think we’ve just scratched the surface when it comes to AI agents. This isn't speculation - we've spent the past 14 months at Founders Factory building and deploying agentic systems that are already live in insurance and fintech, but we faced the same challenges that anyone building agents would’ve come across over the past year:
Managing tool integration and access can get quite complex based on the outcome you’re trying to achieve with your agents. Agents are only as good as the tooling they have access to but an abundant increase in tools increases failure rates
LLM changes and upgrades can affect agent reasoning and decision-making, as prompted processes that worked on one model may not work on another. With the pace of change we’re seeing in the frontier model space, constantly ensuring consistency in outcomes becomes a challenge
Multi-agent workflows require complex orchestration: managing agent state (the snapshot of settings, LLM versions, and tool access), handling errors, and ensuring context remains relevant across asynchronous operations. Without proper state management, errors cascade through the system.
With every run of an agent, the risk of hallucinations and lack of accuracy starts to increase and this is probably the most frustrating issue, especially when the agent works well in the prototyping stage but starts to fall apart when deployed to production
Multi-modal AI (especially voice) is still in its nascent era and therefore, there has been a high reliance on chat-based solutions. Most voice models still occasionally come out with gibberish
Last but not least, agents begin to unravel when dealing with high traffic environments as you scale from 5 → 100,000 → 1,000,000 which results in engineering and operational challenges. This, at scale, could potentially result in hemorrhaging of capital and resource utilization
These challenges have consistently kept AI agents in the experimental phase of the product lifecycle, rather than making it to full-scale deployment. In fact, a recent MIT study highlighted a stark pilot-to-product chasm when it comes to GenAI tools, citing a 95% failure rate when it comes to deployment.
2026: The inflection point
While the problems over the past 12 months have been numerous, the solutions are very nearly there. If we take just some of the key announcements last month as an indicator alone, we can clearly see these pain points being addressed and keeping ‘AI agents’ as the buzzword for the next 12 months and beyond. Here are some key highlights:
October 6: OpenAI announces AgentKit, offering developers visual builder tools, connectors, and UI components for creating and orchestrating production-ready AI agents.
October 8: n8n raises $180 million in Series C funding, highlighting the growing significance of automation-focused agentic platforms for enterprises.
October 16: Oracle launches new embedded AI agents within Fusion Cloud Applications, boosting smarter decision-making across finance, HR, supply chain, sales, and service workflows.
October (mid): Wiley debuts an agentic research platform enabling collaborative and interoperable AI-powered analytics and scientific discovery for researchers.
October 22: LangChain announces LangGraph 1.0 and Agent Builder, ushering in production-ready agent workflow orchestration for robust, multi-step agent programming.
October 27: Vercel launches a rapid agent prototyping initiative (“Agent on Every Desk”), expands its marketplace with easier one-click installs and human approval safeguards.
Building agentic AI at Founders Factory
At Founders Factory we’ve already solved a lot of these pain points and built AI agents to deliver outcomes we think will re-shape industries such as fintech and insuretech. By focusing on industry-wide use cases, we’ve steered away from building AI agents for niche problems, ensuring we’re always building for scale, and therefore facing the challenges I mentioned head-on, rather than avoiding them. This ensures that the AI agents we build will have a high level of adoption, deliver on the promised outcomes and handle the high-volume environments we’re building in. We're not just building software anymore; we're developing goal-based systems that require continuous human feedback, immutable release snapshots, and conversation-level regression testing. Traditional software gives the same output for the same input. Agents reason creatively within boundaries. This fundamental shift demands new development methodologies: declarative programming for agent behaviors, continuous quality assurance through annotated conversations, and test suites that simulate thousands of real customer interactions.
Here some of the use cases we’ve tried to tackle over the past 14 months:
Agents for Time Efficiency
JointlyAI started with a simple observation. Insurance companies pay out £1.10 in claims and operational costs for every £1 in premiums. That's unsustainable.We built an orchestrated multi-agent system handling insurance claims end-to-end. Not automation. Not a chatbot. A fully autonomous claims team. FNOL agent, damage assessment agent, fraud detection agent, repair management agent, settlement agent. Each specialized, each focused. The results at Lemonade tell the story: 50% automation rates; 2.5 million tickets annually; $120M saved so far, $1B projected over four years; and customer satisfaction at 4.7/5.But scaling exposed cracks. Voice AI on the FNOL side started breaking at volume. Accents, background noise, emotional customers. We had to refine everything: tone settings, voice parameters, tool access patterns. What worked for 100 calls fell apart at 10,000.The bigger challenge was agent coordination. We introduced a network of eval agents sitting at the top, assessing outputs from each agent before handoffs. Think of it as quality control in real time. Without it, errors compound. One agent misclassifies damage severity, the next agent quotes wrong repair costs, the settlement agent pays out incorrectly. Eval agents catch these cascading failures before they happen.The counterintuitive finding: smaller, specialized models with deterministic outputs build trust faster in regulated industries. Insurers want transparency, not black boxes. So we built glass boxes. Every decision: traceable. Every outcome: auditable.
Agents for Search & Personalization
Project Delorean reimagines insurance distribution for the UK's 5.2 million SMEs. They currently pay £15.5B annually through a system built for the 1980s. Brokers taking 55% commissions. Service fees at £100 per year.We built a lean team augmented by agent colleagues. A broking agent, quote manager, policy reviewer, pricing agent, SEO copywriter. These aren't tools. They're team members. When a Manchester plumber searches for liability insurance, they land on a page written specifically for Manchester plumbers. Local risks, relevant coverage, instant quotes.We hit multiple challenges here as well. LLM changes broke our quote flows constantly. GPT-4.1 to GPT-4.1 mini migrations meant full prompt re-writes to manage workflows effectively. Multi-step workflows collapsed when the context was lost between the broking agent and quote manager. Routing between the two wasn’t fast enough, and users churned in the process.The solution was dynamically overlaying routers and having a parallelised set of context routines. We also implemented declarative guardrails: hard boundaries that agents cannot cross while maintaining creative problem-solving - an approach also used by Sierra AI. Each agent became part of one combined state for speed, with portions of context exchanged dynamically in the background as the user responded to messages. As a result, we could performantly bring up RAG context, user information, and route to an agent<>model combo that would respond to the user in under a second.
Warren tackles personal finance differently. 77% of people already use LLMs for financial tasks weekly. 18 million Brits want AI financial guidance. Yet 91% get zero regulated support because human advisors only serve the wealthy. Warren delivers a 10x improvement over ChatGPT for finance. The unlock was voice. Voice AI finally crossed the "actually usable" threshold in 2024. People don't type "I'm scared about my mortgage." They say it. That emotional connection drives real behavior change. 96% positive experiences.
But voice brought its own challenges. Multi-modal AI was still nascent when we started. Voice to text to LLM to text to voice. Each conversion point introduced latency and errors. Financial information delivered 3 seconds late feels broken. Users talking over Warren crashed the entire conversation state. Background noise from a coffee shop made the agent recommend bankruptcy instead of budgeting. The hallucination problem hit differently with finance. We had to implement multiple validation layers. Every financial product mentioned gets verified. Every calculation is double-checked. Every piece of guidance is filtered through compliance rules. The agent that was supposed to democratize financial support almost became too cautious to provide any insights at all. We solved this through declarative guardrails: deterministic boundaries the agent cannot cross while maintaining conversational flexibility. FCA rules became hard constraints, not suggestions. The agent explores creative engagement within these boundaries, providing personalized guidance without regulatory risk.
Agents for Resource Efficiency
Project Konduit
Project Konduit attacks capacity allocation in insurance. MGAs wait 12 weeks for capacity today. Spend £50k on due diligence per program. Most vet two programs annually. The entire Lloyd's market operates like it's still 1688. Our agent powered marketplace slashes these metrics. Onboarding from 12 weeks to 1 week. Due diligence from £50k to £1k. Deal flow from 2 to 40 programs annually, a 20x improvement.But we're still building, and we can already see the challenges ahead. Data standardization will be brutal. Every MGA reports differently. Every insurer wants different metrics. Our agents will need to translate between dozens of formats in real-time. The multi modal challenge hit hard here too. Documents came as PDFs, Excel sheets, emails, even handwritten notes. Our agents had to parse all of it, accurately, at scale. Tool integration complexity multiplies. Each insurer has different APIs, different authentication systems, and different fetch processes. Some had no APIs at all. Managing tool access for agents became exponentially complex as we added carriers.Soon however we’ll have autonomous agents allocating capital in real time. Algorithms scanning opportunities, pricing risk, deploying capacity continuously. We're building the infrastructure for this future. Standardized data feeds, open APIs, performance telemetry. Lloyd's for the AI age, with the US market (known for its complexity) firmly in sight.
Viva la agentic revolution
What we've proven over 14 months is simple: The challenges you face with AI agents are solvable when you focus on specific industry problems. Don't build generic agent frameworks. Build specialized agents embedded in existing workflows.
Every project taught us that narrow beats broad.
To be clear, narrow doesn't mean niche; it means building specialized agents for industry-wide problems at scale. We focus on specific, well-defined tasks that affect millions, then orchestrate multiple specialized agents together. This approach enables regression testing at the conversation level, ensuring our agents scale reliably from 100 to 1 million interactions.
Specialized beats general. Integrated beats standalone. The future isn't AGI replacing industries. It's thousands of specialized agents, each excellent at one task, orchestrated together. We've proven it works in the most regulated, complex industries on earth in insurance and finance, and in 2026 we expect that to encompass more industries.
The agent revolution isn't coming. It's here.
About Jacob and Jack
Jacob is a product leader at Founders Factory focused on turning AI potential into real, production-ready systems. He specializes in building workflow-automation agents and LLM-powered platforms across healthcare, fintech, and retail, shipping vertical AI products such as Chiron AI, Nila, Jointly AI, and Ditto AI.
Jack is the applied AI lead at Founders Factory. Jack supports portfolio startups by drawing on his own founder experience, helping teams hire top technical talent, architect scalable systems, and build product hands-on. Now part of Founders Factory’s mission to power founders to go further, faster, he works directly with ambitious early-stage companies to accelerate engineering execution and turn ideas into production-ready technology.
News from the Factory Floor
practical advice
Venture Building with AI Agents
Specialized beats general. Integrated beats standalone. The future isn't AGI replacing industries. It's thousands of specialized agents, each excellent at one task, orchestrated together. Hear from our Product team on their experience of building AI agents over the last 14 months.
trends
Understanding the US-Australia Mining Framework
This collaboration between two of the world’s most resource-rich and innovation-driven nations sets the stage for an industrial era built on shared investment, faster deployment, and technology-led sustainability.
practical advice
Finding Product–Market Fit: The Lapis Approach to Rethinking Market Focus
This is the story of Lapis, backed by Founders Factory and Mediobanca through our MBSpeedUp Accelerator, reimagining how knowledge is stored, retrieved, and applied inside financial institutions.