LLM Pricing and Strategy
I review the latest updates to the Gemini suite of models and compare pricing with OpenAI, Claude and DeepSeek.
I also talk about the difficult choice faced by OpenAI and Gemini in displaying chains of thought - due to the recent paper showing high reasoning performance when training with just 1k chains of thought.
Cheers, Ronan
🛠 Explore Fine-tuning, Inference, Vision, Audio, and Evaluation Tools
💡 Consulting (Technical Assistance OR Market Insights)
AI Model Market Analysis: Pricing, Performance, and Strategic Challenges
Current market betting odds on Polymarket show OpenAI as the favorite to have the best model by June 30th, with Google in second place. However, these odds may overstate the certainty, as 5-6 companies have similar capabilities to produce leading models.
Current Model Performance
Google currently leads with their 1.5 series models:
- Gemini Pro 1.5 offers 2 million token context length
- Performs similarly to or better than GPT-4
- Significantly exceeds other models' context lengths (Anthropic: 500k tokens, OpenAI: 200k tokens)
Pricing Analysis
Reasoning Models
- OpenAI o3-mini and DeepSeek R1 are the main production options
- Price differential reflects both performance and regional factors
- o3-mini shows superior performance on Arc and math benchmarks
High-Quality Models
- GPT-4, Claude 3.5 Sonnet, Gemini Pro 1.5
- Gemini Pro 1.5 priced significantly lower despite longer context length
- Claude Sonnet maintains leadership in coding applications despite age
Budget Models
- GPT-4 Mini, Claude 3 Haiku, Gemini Flash 2.0
- Gemini Flash 2.0 priced at $0.10 per million tokens
- DeepSeek Chat offers high performance at budget pricing ($0.14/million tokens)
Technical Advantages
Google's key advantage comes from their TPU infrastructure:
- In-house tensor processing units
- Lower cost per token compared to NVIDIA GPU-dependent competitors
- Enables competitive pricing while maintaining performance
Strategic Challenges
A recent paper from Stanford highlights a key challenge:
- Models can be tuned to reason using just 1,000 examples of detailed reasoning chains
- Creates tension between transparency and protecting IP
- 32B parameter models trained on these traces achieve 93% performance on Math 500 benchmark
- Poses strategic difficulties for companies considering whether to expose detailed model reasoning
Market Implications
The competitive landscape suggests:
- Price pressure will continue, especially from Google and Chinese providers
- Trade-off between model transparency and IP protection
- Specialized strengths (like Claude's coding performance) may prove more durable than raw benchmark scores