Major tech companies like Google, Meta and OpenAI have spent a large part of this year releasing what they call new foundation models. The world seems to be in a race to be the go to model to build new models and products. Despite all of the hype (or perhaps because of it), foundation models as a concept are still poorly defined.
I think it's important that we come up with some simple criteria that a model needs to meet to be considered 'foundational'. As a developer, these are going to be developer centric and have a strong bias towards open source. Here's what I'm thinking:
- Foundation models are pre-trained
- Foundation models are general in nature
- Foundation models are open source
Foundation models are pre-trained
There are hundreds of new papers each week showing off new model architectures. Chances are that some of them are great but reproducibility in AI is still a big problem. If you want to call your model foundational it needs to be pre-trained and ship as a ready to run model.
That doesn't mean it needs to smash all benchmarks without fine-tuning but just proposing a model architecture is not providing a full foundation.
Foundation models are general in nature
Foundation models are, in most cases, there to be a useful starting point for a number of applications. In many cases that will require fine tuning, or in some simple cases prompt wrangling.
ChatGPT and Midjourney are great because of their breadth. The applications and models built on top of them will likely be more specific and tuned to a single use case. This, to me, is the ideal scenario.
Foundation models are open source
Probably the most contentious of the three here. If you have an API as the only interface to your model, it's not a foundation model. It might still be a wonderful model (ChatGPT as an example) but it's not a foundation.
Open source here means open code, available weights and ideally access to the original training data. I'd settle for the first two in a pinch but ideally it hits all three.
If we look at Stable Diffusion it hits at least the first two. Knowing if you can access all the training data is harder to verify but a good portion of that is open as well.
Once stable diffusion was released there was a flurry of activity on top of it. People built GUIs, tweaked the setup and fine tuned it to develop many new models. Similar activity happened when OpenAI released Whisper.
That kind of rapid creation and collaboration is what makes a model foundational.
ChatGPT and GPT4 on the other hand are API access only (at least at the time of writing). There will likely be many great products built on top of those APIs but they're not foundation models while they are locked away.
Similarly, Stable Diffusion is available as an API. That doesn't stop it being considered a foundation model. I am all for wrapping these models in managed services but the foundation of AI should be open.