The End of the 85% Illusion: What Public Disclosures Reveal About AI-Native Gross Margins
May 13, 2026 · 8 min read · 5 sources
InferMargin Research — Article 1 Published May 2026
For the last fifteen years, the B2B software playbook was written in stone: build the product once, distribute it infinitely, and enjoy 80–90% gross margins. Software had a high fixed cost of creation but a near-zero marginal cost of distribution.
Generative AI has fundamentally reshaped this equation. Intelligence is no longer free, and every API call to a frontier model eats directly into unit economics.
Venture capitalists and founders are converging on a new reality: AI-native companies operate more like heavy infrastructure businesses than traditional SaaS. Yet, when founders step into board meetings to defend an exploding OpenAI or Anthropic bill, they are often flying blind. Everyone knows margins are compressing, but what exactly is normal for an AI-native startup at $1M, $5M, or $50M ARR remains poorly documented in public data.
By aggregating and verifying public disclosures from leading VC firms (ICONIQ, Bessemer) and foundation model providers, a clear — and sobering — picture emerges.
1. The 50% Ceiling: What the Macro Data Tells Us
If you are managing an AI-native product with a 55% gross margin, you are actually ahead of the curve.
According to ICONIQ Capital's January 2026 State of AI: Bi-Annual Snapshot — surveying approximately 300 software executives — the average gross margin for AI companies was 41% in 2024, climbing to 45% in 2025, with companies projecting 52% in 2026.
This margin compression is not a bug; it is a structural feature of the AI stack. As these companies scale, inference costs do not dissolve into economies of scale. ICONIQ found that while talent costs drop from 32% to 26% of total spend at the scaling stage, inference rises from 20% to 23% of revenue — moving in the wrong direction relative to how traditional SaaS COGS behaves at scale.
The data from Bessemer Venture Partners confirms a sharp bifurcation. In their State of AI 2025 report, Bessemer studied 20 high-growth AI startups and identified two distinct archetypes. The hyper-growth "AI Supernovas" — companies sprinting from seed to $100M ARR in record time — operate at an average gross margin of just 25%, and Bessemer explicitly notes that "many of the AI Supernovas have negative gross margins, something we don't tend to see often in software."
Their more disciplined peers — the "Shooting Stars," who reach ~$3M ARR in their first year while maintaining product-market-fit fundamentals — average closer to 60% gross margin. Even this is meaningfully below traditional SaaS benchmarks.
Public benchmarks today cluster around a very different range than the SaaS era: roughly 50–65% for disciplined AI-native companies, with some hyper-growth players operating far below that — and a small number of older businesses still above it. The 80%+ margins that defined the SaaS era are off the table for any company genuinely using LLM inference at scale.
2. The Foundation Squeeze: Why the 50% Ceiling Exists
To understand why application-layer AI margins are compressing, one must look at the foundation models powering them. If the underlying infrastructure operates at low margins, it becomes mathematically constrained for the application layer sitting on top of it to sustain 85% margins without extreme pricing power.
The public data confirms this squeeze. According to analysis by Jason Lemkin at SaaStr, citing reporting from The Information, OpenAI's compute margin on paid products was roughly 35% in early 2024. By October 2025, that figure had improved to approximately 70%.
This headline number deserves a clarification, however. Compute margin excludes training costs and R&D — it measures only the inference-to-revenue ratio on paid products. True gross margin, which would include the amortized cost of training frontier models, remains lower. But the improvement from 35% to 70% over 21 months is real, and it reflects genuine operational maturity at the foundation layer.
The picture is even starker for Anthropic. Reporting cited in the same analysis found that Anthropic operated at a negative 94% gross margin in 2024. For 2025, the company projected a gross profit margin of roughly 40% on its enterprise and API business, with internal forecasts of reaching 77% by 2028.
If a foundation model provider scaling to $1B+ in ARR is projecting a 40% margin in 2025, the AI-native wrappers consuming their APIs are structurally constrained. The application layer cannot escape the gravity of the infrastructure layer's COGS.
3. The Structural Shift: Inference is the New COGS
In the traditional SaaS era, the primary constraint on growth was Customer Acquisition Cost (CAC). COGS was largely negligible (hosting, basic compute, customer support). In the AI-native era, the constraint has shifted to inference.
This requires a fundamental re-architecture of how products are built and priced. As Michael Truell, CEO of Cursor, noted in their July 2025 pricing update: "the hardest requests cost an order of magnitude more than simple ones." This variance breaks traditional flat-rate SaaS subscription models. If a power user submits 100 complex reasoning tasks against a $20/month subscription, the per-seat margin can instantly turn negative.
To defend their unit economics, AI-native engineering teams are adopting two primary defensive strategies.
1. Multi-Model Routing. Relying exclusively on a single frontier model is no longer financially viable. The ICONIQ snapshot reveals that multi-model strategies are now standard practice: OpenAI remains the most utilized (77%), followed by Gemini (55%) and Anthropic (51%). ICONIQ notes that companies are increasingly routing the majority of workloads to smaller or fine-tuned models, escalating only high-complexity tasks to frontier models.
2. Aggressive Caching. Foundation providers are now building optimization tools directly into their APIs to help customers survive the margin squeeze. In August 2024, Anthropic launched Prompt Caching, allowing customers to cache frequently used context. The stated impact is profound: reducing costs by up to 90% and latency by up to 85% for long prompts. Launch partners like Notion explicitly cited this feature as necessary to make their AI products "faster and cheaper, all while maintaining state-of-the-art quality."
Crucially, prompt caching is now a native API capability — not a custom optimization. Any startup using Claude can deploy it quickly, which makes caching coverage a reasonable question in technical and financial diligence for high-volume RAG or agentic workloads.
4. A Note on Methodology
This analysis aggregates publicly disclosed gross margin data from a small number of venture capital research reports (ICONIQ Capital, Bessemer Venture Partners) and foundation-model provider statements. The underlying sample is limited (Bessemer's 20 startups; ICONIQ's ~300 executives), largely self-reported, and selection-biased toward portfolio companies of the reporting firms.
Verified, cohort-level data from independent benchmark networks — measuring exact LLM API spend against AI product revenue across comparable, peer-reviewed companies — does not yet exist publicly. This article reflects what public disclosures can tell us; what they cannot is exactly where private peer benchmarking becomes necessary.
5. Conclusion
The data is unequivocal: the 85% gross margin that defined a generation of cloud software is no longer the baseline for AI-native applications. With industry averages settling between 41% and 52%, and hyper-growth outliers operating at 25% or below, the economics of software have been reset.
For most AI-native founders, the question is no longer whether to optimize, but how to measure whether their current optimization stack is competitive with their cohort.
Yet, knowing that margins are compressing macroeconomically does not solve the microeconomic problem in the boardroom: Is our specific inference-to-revenue ratio healthy for our specific use case and ARR stage?
This is the gap an independent peer benchmark on AI-native unit economics exists to close: verified, cohort-level comparisons of LLM API COGS against AI product revenue.
Sources referenced
- ICONIQ Capital, State of AI: Bi-Annual Snapshot, January 2026 — iconiq.com
- Bessemer Venture Partners, The State of AI 2025, August 2025 — bvp.com
- Michael Truell (CEO, Cursor), Clarifying our pricing, July 2025 — cursor.com
- Jason Lemkin (SaaStr), Have AI Gross Margins Really Turned the Corner?, December 2025 — saastr.com
- Anthropic, Prompt Caching with Claude, August 2024 — anthropic.com
InferMargin is an independent research project benchmarking unit economics of AI-native startups. We publish aggregated findings as cohort thresholds are met. Join the research at infermargin.com.