Agents, Task Time Compute, & Task Time Marketplaces
Agents coordinating to complete complex, multi-step tasks for users and using a marketplace to bid out individual tasks to specialty agents.
Marketplaces are the best business model on the Internet that we’ve seen. Internet marketplaces allow millions of participants to simultaneously join either the supply or demand side of a market with very low friction. They allow the participants to transact seamlessly in fractions of a second and repeatedly. Every technology shift since the coming of age of the Internet has given rise to a market of some kind
eCommerce created eBay and Amazon
Social created Etsy & FB Marketplace
Mobile created Uber
Search created Google Search Ads
With GenAI, it’s reasonable to assume a marketplace will also be created, but what might it look like? One future, which we might see as early as this year, is an Agent marketplace. As AI agents become more capable and specialization becomes more common, its likely that Agents will offload tasks to each other either for improved precision or to take advantage of the run-time economics of different models.
We posit that Agents, and a new “task-time marketplace” will emerge and will function similarly to real-time bidding systems much like search ads. In this case instead of a search query, there will be a task or prompt for an agent to bid for the right to serve.
What are Agents Anyway?
2025 is already setting out to be all about “agents”. As usual, the AI community does a wonderful job naming something without providing any clear definition of what it is. For us, “agents” are entities that have agency, meaning the entity has the resources and authority to take actions, either for itself or on behalf of another.
AI agents are AI systems that we (people or organizations) provide with the resources and authority necessary start, progress, and finish jobs on our behalf.
This is exciting since the types of “jobs” will be incredibly varied. It could be the end-to-end process of researching a new chemical compound, drug delivery mechanism, compiling research, running clinical trials, and managing FDA submissions.
Below is an animated representation for Google’s new “AI Co-scientist”:
Given a scientist’s research goal that has been specified in natural language, the AI co-scientist is designed to generate novel research hypotheses, a detailed research overview, and experimental protocols. To do so, it uses a coalition of specialized agents — Generation, Reflection, Ranking, Evolution, Proximity and Meta-review — that are inspired by the scientific method itself. These agents use automated feedback to iteratively generate, evaluate, and refine hypotheses, resulting in a self-improving cycle of increasingly high-quality and novel outputs.
This is just the beginning of multi-agent systems. There will be similar systems of AI agents built for many disciplines and multi-disciplinary coordination.
Resources
Resources are the set of things that the AI system will need to get the job done. At minimum, the AI will need enough compute available to do what it needs to: reasoning time to decipher which subtasks need to be executed and in which order and then the specific functions of each subtask: document retrieval, summarization, physics calculations, code generation & execution, etc. It will also need credentials with permissions to access things that might be behind paywalls or within secure networks. It may also need access to a bank account with funds to buy things or complete transactions.
Access to resources is important to understand as we breakdown how an agent might go about completing these jobs.
Agentic Team Building
It is clear that newer models are getting more and more capable. However, this doesn’t necessarily mean it will make sense for one model to do all the individual tasks related to a job that needs to be completed. First of all, for more complex jobs with hundreds or thousands of individual tasks, it may be too expensive for the large model to spend all it’s compute budget on tasks that could be performed well by other (smaller) models. And there may be other models that are specially trained for that type of task that can do it better, faster, and thus cheaper than the larger model.
A likely setup is to use a large, reasoning based model to take a complicated job and break it down into separate tasks. This model will be the “agent” and the agent will assign those tasks to other models.
It is not yet clear what term will be used for the main agent. You may hear things like: “manager”, “conductor”, “supervisor”, “primary”, “DRI”. These are also similar to labels used in the workplace today.
How will the conductor choose which models to use to get the job done?
Task Time Compute
One of the great things about reasoning based models is the model’s ability to break down the request and “think” about how to best tackle it. It will automatically define workflows necessary for the job to get done.
We call this task-time compute. What the reasoning model will to do is determine a strategic plan that takes into account how much compute budget (inference time, total tokens, reasoning steps, backtracking, etc) is available for this job. If there isn’t enough budget for the agent model to do all tasks itself, it will need to consider other models to take on different tasks and coordinate to assign and keep track of specific tasks to those models.
This is not to far from where things are today. Even today many model API providers downgrade to cheaper, but effective models based on the prompt given (even if not disclosed to the user).
This responsibility of coordinating tasks among many other models is where things get truly interesting. One might go as far to say this is when we’ll truly have AGI as coordination is one of the most essential human skills.
Model Discovery & Selection
As the number of specialty models grow, the agent models will want to make use of them to reduce the total compute time and money associated with a user’s job. This actually already happens behind the API of major foundation model providers. The larger, smarter model is analyzing the prompt from the user and determining if a smaller (dumber?) model can handle it. However, these smaller models are generally built by the same model makers as the larger model. This makes sense as the model maker wants to keep the user within its ecosystem and it probably has methods for the models to work effectively with each other.
Eventually, it won’t be possible or efficient to keep the calls coming from inside the same house. Especially with open source models, it will be better for models to rely on externally developed models because they are better and cheaper and everyone wins. Additionally, users may want to compose their own set of agents from several model makers to get the best of the best. The “best” will be a moving target as new entrants come up frequently to take the belt from the current champion as the best model for “x” specialty.
Verified Agents
At task time, the primary agent is deciding which models it wants to use for each task. If there are many models to choose from (as is our thesis), it won’t be possible or efficient to hand code which models it will use for different tasks. It will need to create a dynamic market for all models to “bid” on tasks. The agent can determine the compute budget for each task and accept bids from a myriad of other models that claim to be specialized in the task.
These dynamic markets already exist. When a user searches on Google, advertisers bid to place an ad next to the search results for that user. Google accepts the bids and determines the winner(s). All in fractions of a second and probably the best, legal business model created in human history.
However, this is where things get dicey. How can the agent determine if the other model bidding for this task is actually good at that task? How would it verify that claim? Additionally, there will likely be bad actors that perpetually underbid to win the auction so they can gain access to the resources (bank accounts, credentials, permissions, etc) that the agent has been provided access to for the task.
There is where we need to step up our efforts to provide assurances about agent models’ and their abilities. We don’t want our agent models to leak our important data to untrusted actors.
This is where verifiability can be a critical part of the infrastructure for agents. Agents can programmatically verify information about each model and at task time, only include models that are verified. This would make it easy for good actors to engage with each other and make it much harder for bad actors to break into the network by claiming to have a new SOTA model that is a honeypot to gain access to privileged resources.
“ A2A is a new protocol that allows for agents to discover other agents with specific capabilities via a registry!”
https://x.com/altryne/status/1909999911275512044
"Agent to Agent communication between software will be the biggest unlock of AI. Right now most AI products are limited to what they know, what they index from other systems in a clunky way, or what existing APIs they interact with.
The future will be systems that can talk to each other via their Agents. A Salesforce Agent will pull data from a Box Agent, a ServiceNow Agent will orchestrate a workflow between Agents from different SaaS products. And so on."
- Aaron Levie (Co-Founder, CEO of Box)
https://x.com/levie/status/1897135250737938672