Building the agentic future: Lessons from a health AI agent

Why we need health AI agents

By 2035, over half the global population is expected to be overweight, costing the world economy an estimated $4 trillion. At the same time, AI is advancing rapidly. Many startups use LLMs like GPT-4o, Claude 3, and Gemini 1.5 Pro to build the backbone of AI agents that can reason, adapt, and act.

Now imagine a billion people with access to an always-on AI agent that acts like their nutrition coach that provides personalized and proactive guidance. Diets would improve. Chronic disease rates would fall. Healthcare would shift from treatment to prevention. This is the promise of AI agents: domain-specific, intelligent systems that solve real-world problems at scale.

We see the infrastructure for agents forming, with tools like Zep for memory, AutoGen for reasoning and orchestration, and many others. But few agents feel truly smart enough to be an operator and do things for you. Today’s agents are mostly generic copilots or chatbots. Truly useful, vertical-specific agents that operate inside a domain and deliver consistent, meaningful action are still rare, especially in the consumer space.

Over the past year, we’ve been building a health-focused agent from the ground up. Our agent is using tens of millions of tokens a day. In doing so, we’ve run into the deeper product, design, and engineering challenges that come with building agents. And we want to share with the industry some of the challenges and learnings.

Designing AI agents from first principles

Agents introduce a completely new paradigm that you should rethink a category from the ground up. Instead of AI’fying a traditional app, you need to rethink from first principles what it means to solve this specific problem and whether solving it with LLMs is better.

This kind of rethink applies across industries. Whether in legal tech (e.g., Harvey AI), personal finance (e.g., Cleo), or mental health (e.g., Woebot).

With Welling, we asked: if language models were the default UI, how would a nutrition coach work? We didn’t look at for example MyFitnessPal (the most common calorie tracking app), and upgraded some of its features with AI. Instead, we completely rethought the experience and AI’fied the coach instead. We asked ourselves what a coach would do to solve this user problem. This creates a completely different product experience that simplifies it for the user.

What is the UI of an AI agent?

Kevin Weil, OpenAI’s Chief Product Officer, called chat “the most intuitive interface to humans.” And we think agents should start with the chat as the main interface. However, pure chat interfaces (for example WhatsApp, Telegram) are a very limiting factor for what functionality you can deliver to your users. Most effective agents need multimodal input and presenting data, and updates in more dynamic ways.

That’s why modern agentic UIs are evolving into hybrid interfaces: a chat-driven base layered with dashboards and dynamic components (like ChatGPT just introduced commerce and shopping cards in the chat). The chat can interact with the data and can effectively make changes on the dashboards. As a user, you’re commanding an operator. At Welling, it took a lot of engineering and design effort to make sure we could deliver this to our users reliably. And this new design paradigm comes with design challenges that the industry has yet to agree on. For example:

How to separate input (e.g., chat) from output (e.g., dashboards) without confusing the user?
In which use cases should the agent decide versus the user deciding? For example, with Welling, should water be logged automatically extracted from food logs, or should the user do it manually to keep autonomy on deciding what counts as water?
Shall all input data go through the agent, or should some be manual? How do we make the distinction?

There is no established playbook here. Just like early mobile UX needed new design languages and patterns that users over time got used to, agent-first design is creating its own paradigms.

Solving for a large messy problem space of language

LLMs are great with language—but humans can be (kind of) vague. “Had a bit of curry rice” isn’t a database entry. It’s a probabilistic phrase describing a regional dish with unknown quantities. Each vertical agent needs to build their own layer between their application and their foundational models that will bridge the logic and handle the vertical-specific use cases. With Welling, we built our own food parsing engine and algorithms using context-aware prompts, search and matching algorithms, fallback mechanisms, and user feedback to make sure Welling makes reliable and consistent calorie and macro estimations.

Any domain will have similar challenges. And this middle logic layer is partly where the differentiation and moat can be built.

Requiring deterministic outputs from a probabilistic system

Foundational models are inherently probabilistic. But in many verticals like legal, finance, or in our case, health tracking, users expect consistency.

Most general-purpose LLMs (including ChatGPT and Claude) will return different outputs for the same input because the model will generate a new response every single time. But when someone logs a sandwich with “ham, egg, tomato,” they expect the same calorie estimate every time. This requires teams to build a deterministic logic layer with sophisticated guardrails on top of the model. The outcome is consistency for your users. For example, when logging that sandwich with “ham, egg, tomato” with Welling, you will get the exact same values every time.

For many industries, this challenge of requiring deterministic outcomes from a probabilistic model needs to be overcome to provide consistent and reliable input that your users can trust.

Memory and context: Where to store what

Context length is expanding (e.g., Gemini 2.0 supports 1M tokens in the context window), but the challenge is not how much you can store. We expect that the industry will only keep expanding the context window. So the question is what you store, how you process it before you store it, and what you put in the context window.

Each vertical needs to think through specific use cases. For exampl,e at Welling, when the user asks a question about what to eat for breakfast, do we put past 30-day historical eating data into the prompt together with our coaching algorithm? Or do we create an abstracted layer of “Food Preferences” that is preprocessed from all historical eating data and that just goes into the prompt each time? Each implementation will have tradeoffs in quality of output, latency, and token cost.

Tools like Zep and LlamaIndex allow developers to construct long-term memory, scratchpads, and ways to access your existing data. Deciding what and when something goes into the prompt is a product decision that needs deep understanding of the customer.

Navigating a fast-moving industry

This industry moves at warp speed. Token costs have dropped by >70% in the last 12 months. Models double in performance every 6–9 months. What’s hard today may be trivial tomorrow.

That’s why smart AI product teams build with two questions in mind:

Will this challenge be solved by an improved foundational model in 6 months? If yes, we can deprioritize trying to solve it ourselves.
What then can we focus on that adds to our moat and differentiation using UX, proprietary data and logic, and user network effects?

Building in AI, especially agents, you got many questions about token costs. My tip is to not over-optimize on token costs, because we expect them to keep dropping. Instead, optimize for user experience.

Looking ahead: The rise of the vertical agent

General-purpose agents won’t win in most real-world applications from purpose-built ones. The future will see vertical AI agents: systems deeply embedded in one domain, capable of delivering consistent, contextual, and actionable outcomes with their own UX design catered to that vertical.

Across sectors like health, law, education, and logistics, we’ll see the rise of “agent-native” startups that are not repackaged apps with chat interfaces, but entirely new products imagined from first principles using LLMs. These teams are pushing the limits of what agents can do. They’ll invent new UX patterns, rethink system architecture, and make novel tradeoffs in output quality, memory, and speed. No one knows what is the correct way. We, as an industry, are figuring it out, and we’re excited to be part of that frontier.

Philip Man is Cofounder and Chief Executive Officer at Welling. Welling AI is the future of health and nutrition coaching through smart conversational AI. We build technology that helps people achieve their health goals by making it easier to get insight in their food intake and receive tailored nutritional guidance.

TNGlobal INSIDER publishes contributions relevant to entrepreneurship and innovation. You may submit your own original or published contributions subject to editorial discretion.

Featured image: Hrushi Chavhan on Unsplash

How to create a mobile marketing strategy for your startup