Part 3: Open Source Governance of AI

The trickiness of governing fast growing, complex systems.

Feb 23, 2024

Safety

There is as much discussion about the threats posed by AI as there is excitement about opportunities unlocked by newer (and larger) AI models. The discussion around threats posed by AI fall under the label “AI Safety”. To be fair, many of the threats discussed are scary, though mostly hypothetical as most of what is described cannot be done with the latest models like GPT4 - such as replacing all human labor, building bioweapons, and so on. These threats emerge from within the model itself; the model develops a “will” of its own that is not aligned with the end-user’s (or society) objectives.

There are other threats that are less discussed but actually possible today. These are not threats perpetrated by AI itself, rather by bad actors using AI for malicious ends. Meaning, the bad actor uses an ordinary AI model to do bad things. These include hacking websites, writing & sending spam messages, deepfakes for identity theft, and more. By ordinary model, we mean merely crafting prompts for a model to complete a request related to the bad action. It does not mean fine-tuning a model or altering the model’s architecture.

An emerging area of threat is mixture of the above two kinds - a bad actor subverts an AI model to do things that the end-user might be unaware of. There is research coming out showing how models can be minimally fine-tuned to trigger outputs intended by the bad actor but undetectable by the end-user. For example the bad actor can include specific data in the training set that would create a back-door in the model that would exist after training is complete. This back-door could be used for different purposes such as bypassing safety filters or triggering a specific output for specific prompts.

Regulation

To protect end-users, government officials have been crafting and ratifying a range of policies to regulate AI. This may be the first time when a technology is being regulated prior to its actual full-scale deployment and while the technology is rapidly changing. The proposals, while well-intentioned, do have some flaws worth considering.

An approach discussed is to require AI companies to “register” their models with a government approved agency (when their model exceeds a certain size or computational intensity of training). This approach faces significant challenges from being executed successfully and from actually stopping the harms it intends to protect from.

First, it would be impractical to enforce any regulations as models are being developed around the world and outside the control of any one state so the jurisdiction of any agency would be limited and people will seek other jurisdictions to use the model they prefer - just like people in countries that are banned from using free speech platforms use VPNs to get around firewalls.
Second, as (open source) development improves, it will push the computation resource needs down to commodity hardware. As of today (February 2024), there are a few fully open source models that rival the capabilities of the largest private models (Llama, Mistral, Gemma), with lower parameter counts, and able to run on commodity hardware. This makes the threshold requirement referred to in the regulation obsolete on day 1.
Third, some policies describe establishing special agencies to vet models pre and post deployment. The assumption is that the agencies will be able to inspect the models via their documentation, running evals, and testing the model in specific scenarios. As any software developer knows, it is not possible to design tests and testing processes that will eradicate bugs and edge cases from any software system, including back-doors specifically buried upstream in training data. Large AI models are the most complex pieces of software ever created and even the creators have limited understanding of how the software performs in any given scenario. Additionally, any safety filters to control the software’s performance can be bypassed. The probability that a third party agency can vet the AI models to the extent that it renders them harmless is very low.
Fourth, giving benefit of the doubt to policy designers, the regulatory agencies may acknowledge their own limitations when it comes to inspecting these highly complex AI systems. The requirements being asked of model builders may also be purely disclosing documentation of their testing and safety mechanisms. However, there is a call to have model builders accept liability if the model they released is misused by anyone outside their firm to cause significant harm. This would be catastrophic to entities that are choosing to open source their work and would be detrimental to ongoing research to improve our understanding of AI systems. This is the research we actually need to improve AI safety in the first place. Any firm willing to spend huge sums of money to train an AI system would not be willing to open source that work under these liability conditions and researchers would be left with far less capable models to study.
Fifth - complex regulation kills innovation and opens regulatory capture. One could argue there is no regulatory regime that can adequately contain AI. Already we’ve seen proposals to regulate AI from every level of government and NGO: California, US, EU, and more. Each entity has called for new AND existing agencies to oversee model development. This means any model builder would be held accountable to dozens (potentially hundreds) of different agencies before they can deploy a model and iterate on it’s utility to end-users. The complexity of this regime will stifle developers and empower lawyers, consultants and lobbyists to create regulatory capture.

We are at the beginning of the beginning of AI model deployment. Many people believe this is the start of most (if not all) software enabled systems being rewritten around AI and in some cases written by AI itself. Complex regulation at this stage would dramatically slow down innovation.

This is not to say that all regulation is bad and that we shouldn’t have any regulation at all. Good policy (and more importantly how those policies are implemented in practice) should enable good actors while penalizing bad actors. In the case of AI, we should focus on transparency of models (as proxy for the model builder’s intentions) and scalability.

Transparency is about knowing what went into the model to allow it to do what it is intended to do and for end-users to know before, during, and after model runs that it is only doing what the end-user intended.
Scalability is about accepting the fact that the number of models we will need to monitor will scale faster and beyond the capacity of an agency to meaningfully oversee.
Imagine if we had 1000x more drugs to be reviewed by the FDA each year - the FDA would grind to a halt. Yes, we would stop bad drugs from getting to market, but we’d also stop life saving treatments too.

Open Source Approach

An open source approach can provide the layer of transparency and scalability that we desire. There is already strong precedent for this with the web and the integration of SSL/TLS. This security layer allowed end-users to visit websites and verify it is the site operated by the entity they expected. And, more functionally, it secured the communications between the end-user and the website so malicious actors could not easily subvert those interactions.

This approach is also scalable. Anyone wanting to put up a website that would also perform sensitive actions, like accepting credit cards for payment or capture personal information, can create an SSL certificate that is verifiable by the end-user. This process rewards good actors by creating an easy way to do the right thing.

Similarly, VAIL’s approach is to develop an open standard for validating AI/ML models, to develop the equivalent of the 🔒 in the browser tab bar. We can allow good model builders to be transparent about their models’ capabilities and vulnerabilities through verifiable computation. The model builders can add the equivalent of SSL certificates to their models and allow end-users to verify on their own if they choose.

Of course, this approach wouldn’t stop bad actors from being participants in the AI ecosystem. They can build, train, and release their on malicious models for end-users. However even though the theoretical risk is there, it is mitigated by having transparency through verification & validation at the model layer. Because, in practice bad actors will choose not to broadcast their bad intentions. Rather, they aim to portray good intentions and keep their true motivations hidden. Transparency & verifiability is the right way to check bad actors.

Because, in practice bad actors will choose not to broadcast their bad intentions. Rather, they aim to portray good intentions and keep their true motivations hidden.

Furthermore, this approach is more scalable as it enables the verifying to be done with software instead of third party agencies. With the right tooling and incentives, model builders can set this up without intervention.

In part 4 we will cover how this is possible.

Project VAIL

Discussion about this post