FounderJury · The Diversity Receipt

One model lies.
75% of the time, our models disagree.

Across 126 real founder debates, only 32 ended in unanimous agreement. The other 94 produced contradictory verdicts from 8 frontier models across 8+ vendors. That delta is the product.

Disagreement rate
75%
of debates ≥2 verdict categories
Debates analyzed
126
real founder ideas
Unanimous outcomes
32
25% — the rare consensus
Avg. pairwise disagreement
39%
across 27 model pairs
Why this matters

ChatGPT will agree with you. So will Claude. So will Gemini. Each is trained to be helpful, and each will validate a bad idea given the right framing.

The lie isn't in any single model — it's in asking only one. A vendor cannot ship cross-vendor debate inside their own product: OpenAI won't call Anthropic, Anthropic won't call Google, Google won't call xAI. Multi-vendor adversarial review is structurally outside the incumbents' product surface.

That's the entire moat. The 75% disagreement rate is the receipt.

Pairwise disagreement, sorted high → low
Model AModel BDisagreementSample
GrokxAILlamaMeta
87.5%
14/16
GrokxAIQwenAlibaba
70.5%
55/78
GeminiGoogleGrokxAI
64.2%
70/109
GrokxAIKimiMoonshot
64.2%
43/67
ClaudeAnthropicGrokxAI
58.1%
68/117
KimiMoonshotLlamaMeta
57.1%
8/14
DeepSeekDeepSeekGrokxAI
57.0%
57/100
DeepSeekDeepSeekLlamaMeta
50.0%
8/16
GPTOpenAIGrokxAI
46.7%
56/120
GPTOpenAILlamaMeta
43.8%
7/16
GeminiGoogleQwenAlibaba
38.5%
30/78
ClaudeAnthropicLlamaMeta
37.5%
6/16
GeminiGoogleLlamaMeta
37.5%
6/16
DeepSeekDeepSeekGeminiGoogle
37.0%
37/100
DeepSeekDeepSeekQwenAlibaba
32.5%
27/83
GeminiGoogleGPTOpenAI
29.4%
32/109
ClaudeAnthropicDeepSeekDeepSeek
28.7%
29/101
GeminiGoogleKimiMoonshot
28.4%
19/67
GPTOpenAIQwenAlibaba
28.0%
23/82
DeepSeekDeepSeekKimiMoonshot
27.5%
19/69
DeepSeekDeepSeekGPTOpenAI
26.9%
28/104
ClaudeAnthropicQwenAlibaba
25.3%
20/79
ClaudeAnthropicGeminiGoogle
23.6%
25/106
KimiMoonshotQwenAlibaba
18.0%
9/50
ClaudeAnthropicGPTOpenAI
16.5%
20/121
GPTOpenAIKimiMoonshot
15.9%
11/69
ClaudeAnthropicKimiMoonshot
15.4%
10/65
Ask one model and you get an opinion. Ask 8 and you get a verdict.

Test your idea against 8 frontier AI models from competing vendors. They disagree 75% of the time. That's the data point worth having before you build.

Run your debate →
Live data · Updated every page load · Generated Wed, 06 May 2026 00:39:12 GMT