What do Alpacas have to do with AI's future?

Could AI models become commodities? Microsoft does browsers with Web3 Wallets? Huh! Questions from the road ahead...

Mar 20, 2023

Hello! Welcome back to Cloud Vertigo. Our lifelong learning journey goes on with another AI chapter. New here? Great to have you on board! ⛵

Advancing is always a path. Sometimes we go forward, sometime drift backwards, sometimes we sit and stop.

If human knowledge seems to take the long and scenic routes, Artificial Intelligence advancements feel geodesic. They leave little to chance. A geodesic curve is the shortest path between two points on some surface.

In Euclidean geometry, the shortest path between two points is a straight line. However, when we do not have any idea about the surface, the shortest path needs not be straight. Most strikingly in some geometries the geodesic path does not need to be unique.

Today’s job is laying out some groundwork in the quest to figure out the AI supply chains of tomorrow. I feel there are multiple geodesic paths available. Let’s get started.

Competition around foundation models

The simplest way to start is always cost. We could assess the advancement of AI by its training cost. Prices always tell meaningful stories. In this case, it’s about the power struggle between the interface and the infrastructure players (cloud computing). Remember last week Not Boring sketch?

Until very little ago, AI models used to be prohibitively expensive. Yet, every day with new research, data and computing power being accessible, training costs decline. ARK researchers estimate a cost-reduction trajectory of around 70% yearly

AI training cost declines continued at an annual rate of 70%, the cost to train a large language model to GPT-3 level performance collapsing from $4.6 million in 2020 to $450,000 in 2022. We expect cost declines to continue at a 70% rate through 2030 (Summerling and Downing, ARK Research 2023)

We may be going at a much faster rate than that.

Last month, Meta has released open-source its own large language model (LLaMA). It is trained with trillion of tokens and has comparable performances to OpenAI. This is a significant move. It highlights that Meta’s is the only Big Tech player that does not have a cloud computing game. The devil is in the details: the access to LLaMA foundation model is permissioned and it has a heavy non-commercial clause.

The key message though is clear. Foundation models can become commodities.

Stanford researchers showed that you can take an open-source model such as Meta’s smallest model LLaMA 7B ( it has an estimated training cost at $ 85.000) and fine-tune it with the help of ChatGPT3.5 for 600$. They called it Alpaca. The research demonstrates that you can ask an AI to fine-tune your AI model. This makes the LLMs we have attained suprisingly replicable.

Although it is against OpenAI Terms of Services to use it to build a competing service, it still led some analysts to wonder whether Microsoft has significantly overpaid for OpenAI ($ 10 bn at a $ 29 bn valuation).

What does it mean? It’s the first time it seems plausible that AI can learn from other AI retaining comparable results (of course, unsupervised learning models are not news and have been extensively studied, but this is something slightly different). It speaks of how easy it is to transfer knowledge between models. The resulting copycat model is necessarily sub-par at first, but with subsequent independant reinforcement learning cycles, it could outperform the original model, especially on domain specific tasks.

This could pose a challenge to ARK researchers thesis that AI creating unprecedente demand for training data and taht “high-quality domain-specific AI training data could result in winner-takes-most outcomes across vertical applications”. It’s not data that’s valuable, it’s use!

This is will be a crucial question for Microsoft Enterprise AI play’s defensibility.

The enterprise question will be around the make or buy customers’ choice. Why would customers buy Micorsoft OpenAI LLMs capabilities off the shelf ? The alternative is building from a free Meta model, purchase the first fine-tuning batches and then improve it rent-free with the subsequent internal model interactions. This way customers would retain the intellectual property over the subsequently generated training epochs and develop their own engine. However, they would miss out on the general improvements that come from a broader exposure to the public.

It’s not a new tradeoff: privacy vs efficiency. What is new is the scale of the implications.