December 5, 2024
|
Vincent Hoogsteder

The era of choice in AI

We've entered the era of choice in AI. After a year of OpenAI dominance, you now have the freedom to use equally performing models, at the same price range, from vendors ranging from Silicon Valley to Paris. It's now possible to build use-cases on hybrid models and overall pricing has been dropping 4x in just 15 months. It's a great time to build in AI.

It’s been 19 months since ChatGPT took the world by storm. Consumers could use without any barrier the best LLM model out there, moreover businesses suddenly had the opportunity to develop their own AI applications on top of easy to use APIs, instead of building, training, deploying and managing their own models.

The success of OpenAI combined with one of the biggest inflows of venture funding in history, has resulted in more and more players entering the field hoping to become the future platform for Generative AI. One of the most prominent, Anthropic, is located just a few meters apart from OpenAI’s HQ. Another contender that is making big waves, Mistral, is headquartered 9,000 kilometers away in Paris.

Regardless their location, their strategies are remarkably similar;

  1. Recruit the best of the best AI researchers that have gained streetcred in the labs of Deepmind, Meta, Google or other research institutes. Of course, paying huge salaries for this scarce talent.
  2. Train very large foundation models that are impossible to develop for 99.9% of normal companies.
  3. Showcase capabilities through AI chat often allowing  limited use of their flagship models for free.
  4. Enable third party development APIs allowing teams to build new solutions  using these large models.
  5. Apply usage-based pricing on these APIs that is tied to the volume of data going in and out.

This 5-step approach does not stop with the 3 mentioned players, there are a lot more vendors on the market that operate in this fashion, like Cohere and Google Vertex. The beauty of all of this is that as a company developing AI capabilities on top of these foundational models, you get freedom of choice. This is spurring competition, driving rapid innovation and dramatically lowering prices. 

Very recently, Simon Willison, held a brillant talk about all of this at AI Engineer World’s Fair 2024, this blog post is partly based on his insights.

Beyond OpenAI’s GPT4

Since its launch in March 2023, the GPT4 model of OpenAI was clearly the best option in the market based on performance and held its dominance for about 12 months. Then, new models from the competitive players we just mentioned started to come out. Fast forward to today, benchmarking these models on cost versus performance, it paints a picture of multiple solid options:

By Karina Nguyen

Performance in this figure is based on the MMLU Benchmark (Massive Multi-task Language Understanding). It measures a language model's performance across a wide range of tasks, covering subjects in science, technology, engineering, and mathematics, humanities, social sciences, and more. It’s not a perfect benchmark, but it at least assesses the performance of all these models in an apples-with-apples kind of way.

What you can get from this comparison are a few very important things:

  • Many new contenders entered the market in 2023.
  • The best models of xAI, Google Gemini, Mistral and Anthropic became close contenders to OpenAI’s GPT4 in 2024.
  • The models close to GPT4 also have similar pricing levels, making them direct alternatives.
  • At the same time, a whole range of lower performance, but also better priced models have come on the market.

The AI market moves at very high speed. For example, this chart was made 2 months ago:

By Simon Willison

As Simon notices in his talk, another interesting thing has started to happen:

  • There are two important clusters of models, high performance called “Best” versus cheaper, lower performance models called “Cheapest”.
  • The overall pricing of these “Cheapest” models has come down a lot. We’ll get to that in a moment.
  • Notice that GPT3.5 Turbo marked by a “?” is not performing very well but is still priced relatively high. However, it is still one of the few models that you can fine-tune, with that you can get performance on par with chatGPT4 for lower price.

If you combine these two recent analyses in above charts, it means that:

  1. You have freedom of choice to select a high performance model, from a growing group of vendors.
  2. You now have the possibility to add a way cheaper model next to this for use-cases that have lower requirements on performance.
  3. The lower pricing is opening up more and more use-cases with a clear business case. 

At Mozaik, we have executed multiple projects at customers where we no longer use just a single LLM. Instead, we implemented a hybrid model, often with one high-performance and one or more low-priced models. The high performance model is used for the more complex use-cases (like generating the final answer), where the lower performance models are used to clean and prepare data, or to answer user questions that were classified as simple. This not just makes for a big difference in costs, but this hybrid model also optimizes response times of the system by a lot. In the end, it makes use-cases possible that would not be a smart investment using only high-performance models.

Talking about investments, let’s also have a more detailed look at pricing and what is happening on that front. 

Downward pricing is unlocking use-cases

Although the pricing strategies are very similar, the detail usage based pricing for each of the three vendors differs. It also develops very fast as pricing is quickly updated when new models are launched. Let’s have a look at the highest performing and recent models each of them offers and what the pricing differences are:

Running the same input data on these different models can have up to a 40% difference in pricing. For providing the output answers, this is smaller with a 20% gap.

The competition in the market is clearly driving down these prices. For example, let’s compare the last three generations of OpenAI’s models:

The price of the latest GPT-4o model is 4 times lower than the original GPT4 model. That is a very rapid decrease in about 15 months. Again, opening the doors for use-cases that previously would be too costly to create, but now suddenly have a positive financial outcome. 

And this is not just about OpenAI. One of their biggest contenders Anthropic spoke at the AI Engineer World’s Fair 2024 about their expectations that in the next 12 months the intelligence of their models will go up while the latency & costs keeps going down:

What does this all mean for your AI projects?

We believe we have entered the era of choice in AI. For any use-case you have the ability to choose a model from a selection of vendors. For use-cases that have different demands in quality, you get the chance to create a hybrid setup where the highest performance and most expensive models are used when really necessary, but lower priced models are applied when they are the right tool for the job at hand.

To really capture the benefits of all these new options and possibilities does place some demands on the technical setup and infrastructure of your AI projects.

In order to be successful in the rapidly evolving landscape we believe that a AI infrastructure should:

  1. Be vendor agnostic. Don’t just implement the vendor APIs that are part of your cloud infrastructure by default. Going the extra mile to start adding APIs from other vendors pays off greatly. And it places you in a very good position moving forward as more competition in the space arrives, which it’s happening quickly.

  2. Enable multiple models, even in one single use-case. We have seen very strong results by applying different models for different pieces of the work. This allows you to not only leverage the lowest prices but also the differences in strengths and weaknesses of the models.

  3. Have a quality framework as the central foundation. The only way you can make well funded choices on which vendor, and models are right for the job, is by having a clear definition of success and an evaluation framework to measure output quality. Mozaik partner Gert Jan Spriensma breaks down some recent learnings on setting up such a quality evaluation framework in this post.

Say hi to the era of choice in AI

It’s clear we have entered the era of choice in AI, and that’s a wonderful thing. Remember, everything we discussed in this post happened in only 15 months. Imagine what the picture will be in, for example, a year! 

The stakes are high for vendors that aim to become a model platform player along the lines of OpenAI. In the past two decades, we’ve witnessed a similar highly competitive race, but then to become a dominant cloud player. This led to three clear winners: Google’s GCP, Microsoft’s Azure and Amazon’s AWS. 

Will we be looking at a clear top 3 in AI platforms as well in a few years? Maybe, but we believe more in the scenario, shared by for example Marc Andreesen, of hyper competition with many companies competing for decades.  The model performance is highly dependent on specific use cases and (bespoke) data sets, which allows for a wider range of companies to capture significant market share. This contrasts with the cloud computing market, where services are more standardized and similar across different providers.

And with all this, we didn’t even discuss open source initiatives which might be  more difficult to use for development (not just an API call), but which will definitely play a big role in driving innovation, broadening capabilities, and bringing costs down. We’re very excited about the era of choice we have now entered in and can’t wait to see what we’ll all build because of it.

Come chat with us

Get in touch to find out what your data can do for you. Spoiler alert: it's a lot.

Contact Us