AI/ML is developing at an increasingly fast rate, with breakthroughs occurring on the order of every few months. There is a proliferation of models, both closed source/proprietary models as well as open source. All models are trained and fine-tuned using a combination of public and private datasets to provide various enhancements to the model. Additionally, there is a growth in pre-trained vector embeddings which can be composed to provide special purpose skills to an AI model.
The end-state of this pace of development and proliferation of models is that there will be as many models as there are websites on the internet. An alternative thought is to imagine there being one or more models for each person and organization in the world (including posthumous entities).
Challenges Ahead
This creates a set of challenges for end-users of these AI models. Which model is generating the prediction? Is the model fit-for-purpose for the prediction it is generating (eg. is it a language model trained off reddit posts used for online psychotherapy treatment)? And once put into production, how is the model improved? Reinforcement learning with human feedback (RLHF) is used to tune the model to perform better in different scenarios and to “align” the model to the end-user’s intentions. It will be important to know where that RLHF data is coming from and if the human feedback is actually of high quality. For example, having an army of lay people reviewing output that requires special skills (like reviewing medical summaries from NLP models) would not be valuable feedback for the model to learn from.
Additionally, it is possible for malicious models to be pushed into applications. Models that are vulnerable to prompt injection attacks: that can generate a specific output or follow a pre-set instruction pattern when prompted with a “secret backdoor”. This can also be used to induce hallucinations.
Aside from these vulnerabilities embedded in a model, it will also be important for the model creators and hosted services to provide transparency and assurance that the model outputs were generated “honestly”. The model host and the end-user could agree on the potential vulnerabilities of a model and still proceed. But, it will be important for the end-user to know that the actual model they are intending to use, is the model that the host is generating the results from (most models will be hosted by another party, not used directly by the end user). The model host would need to authenticate the results as produced by the model that was intended by the end-user.
Another important aspect is to support tracing back outputs from models to source datasets. For example, it will be important for an advertiser to know the ad creative was generated without copyrighted materials or that the copyright owners allowed that use case. Without knowing which model produced the output or which data the model was trained on, it will be difficult to protect copyright.
High Stakes Use Cases
This is especially important where model outputs are used for decisions of high consequence.
When a judicial AI is used to determine sentencing for a convicted person.
When a financial AI is used to determine loan approvals and interest rates.
When healthcare AI is used to perform an early-diagnosis and generate a treatment plan for a complex disease.
As AI capabilities improve, there will be a natural tendency to trust the models to take on more responsibilities. The AI's decisions will impact the end-users and it is important that the provenance of the models’ predictions are brought to the surface.
Public Infrastructure
An analogy for this paradigm is how the SSL/TLS standard was developed and integrated into the web architecture as public key infrastructure. The majority of websites that accept sensitive information from a visitor (such as credit cards, PII, etc) use a secure connection to ensure the information is transmitted in an encrypted format and that the website operator is authenticated via their SSL certificate. Similarly, there needs to be a way for AI-model interactions to leverage default protocols to support accountability and transparency between model providers, model hosts, and end-users.
This is VAIL's goal, to develop an open standard that is integrated into the AI-driven architecture of software services. It will not be sufficient to address the challenges above through human-based audits and assessments. Nor will it be enough to enact posthoc consequences on model hosts through legal recourse. We need to develop a programmatic solution that is integrated directly within AI systems.