Data is the world modeled.
Expert-built benchmarks, high-quality data, and real-world environments for frontier models and vertical agents.
Benchmarks
Coming in three weeks.
View methodology →VLM Benchmark
3 weeksA high-difficulty evaluation suite testing vision-language models on complex, real-world scenarios. Goes beyond standard academic benchmarks to surface where leading VLMs actually fail in professional contexts.
Get notified when it ships:
Office Workflow Benchmark
3 weeksSystematic evaluation of model performance in professional office workflows: multi-step document understanding, cross-format reasoning, and tool use across realistic enterprise scenarios.
Get notified when it ships:
Why us
How we're different.
Model-native perspective
Both founders built benchmark and evaluation systems at frontier labs. We understand the actual bottlenecks from the inside — not from the outside looking in.
Global ecosystem reach
Proven track record of international open-source launches and revenue growth across domestic and overseas markets. We reach the labs that matter.
Scalable delivery
We build standardized data pipelines and delivery packages. Quality scales without headcount scaling linearly — because we've designed for it from day one.
Team
Built by people who've been there.
Jiaren Cai蔡佳人
Technical Cofounder
Former open-source lead and post-training researcher at MiniMax. Built the vibe benchmark from 0→1. Drove post-training evolution of the M2 series in coding. Led international open-source launch of M2, M2.1, and M2.5.
Li Liang梁丽
Product Cofounder
Former Agent product lead at MiniMax, managing a 20+ person team. Achieved 5× revenue growth and 8-figure GMV in Q1 2026. Built MiniMax's internal Agent Benchmark from scratch. Peking University; previously Tencent Product Management Program.
We're hiring — founding team
Looking for a BD Lead with AI infra commercialization experience, and an Expert Ecosystem Lead with reach into academic and professional specialist communities.
Why copula
"In statistics, a copula joins separate marginal distributions into a true joint distribution — Sklar's theorem, 1959. Capability benchmarks and real-world deployment are two marginals. Most companies measure each in isolation. We model the dependence between them: including the tails, where models actually fail."
— COPULA LAB
Ready to close the execution gap?
We work directly with frontier model teams. Reach out to discuss data needs, benchmark access, or expert collaboration.
Talk to us