Which model am I getting?

Just reviewing outputs from a model won't tell you which model you're using. Even if your API provider is telling you which model you're using, you have very little ability to know if that's true.

Jul 26, 2024

In our article describing verifiable computation, we describe the importance of knowing which model you’re using and being able to verify you’re using the model you think you are.

Though we can’t know exactly what the output from a stochastic process will produce or how it produces it, we can know what went into the process to begin with. We should also know which model we’re asking to generate results from. This is important, especially when there are thousands (potentially millions) of similar models out there.

And, when the outputs really matter, we should know which systems/models we are asking to generate those outputs. Which enterprises developed systems, which entities are providing access to those systems, which datasets were (or were not) included in the training, and more. This is what verifiable computation does.

Let’s make this less abstract.

AI in the Military

Imagine an officer in the military using an AI to help plan a mission which will deploy a fleet of drones to a hostile region (the team at Shield AI paints this picture well). How can the officer know the model that was procured by the DoD for this purpose is the actual model generating the deployment plan?

AI in Healthcare

Imagine you’re a cancer doctor at a regional health system and you’re using a fine-tuned cancer model to design a 6 month treatment plan for a newly diagnosed cancer patient. How can the cancer doctor know the model she is using is the one that has been trained and tested for its cancer treatment knowledge? And that it hasn’t been updated such that it is not as accurate for the local population?

There are many more scenarios like this we can list out. The truth is that just based on the model’s outputs alone, there is no easy way to know (as an end-user) which model you’re using.

If you’re accessing a model through an API provider, even if you think you’re using the latest version of a model, each API provider may alter it to make it work within their infrastructure.

Alex Volkov recently posted an analysis on X documenting the differences between several LLM API providers when hosting the newly released Llama 3.1 70b model.

Alex found that even though each API provider says they are serving Llama 3.1 70b, there are significant differences in the responses from each provider when using their hosted version of Llama 3.1. Some of those differences are around latency which are not as significant. The big surprise is in the responses generated (number of tokens generated).

This can have huge implications for end-users. If a model has been trained to do a task, but that training can be erased or rendered useless because of quantization or some other manipulation, it can create downstream risks that are imperceptible to end-users. The cancer doctor may not notice a change in the plan that relies on hallucinated clinical results. The military officer may be unaware the model was poisoned by an adversary to lure the deployment to a vulnerable region at a specific time.

Sometimes the human in the loop will be able to catch these errors. However, as we come to rely on more AI systems and people get inundated with more AI outputs to review, there is a risk of just assuming the AI is correct to move on.

So, even though we may not know how a model generates what we need, we should have guarantees that we are at least using the model we’ve signed off on. And not some altered version.

Project VAIL

Discussion about this post