Loading Resources
Setting up the system...

Convergent Validity Diagnosis

Compare whether different benchmark operationalizations produce stable model rankings.

Comparison setup collapsed
The ranking diagnosis is shown below.
Existing Benchmark Ranking Sets
Select 2-4 benchmark ranking sets to compare model ordering across established evaluations.
Retrieved Sample Ranking Sets
Create retrieved-sample slices, optionally add existing benchmarks, and compare the induced model rankings.
Manual Slice
Computing model rankings and rank-divergence diagnosis...
0% rank-divergence risk
Top Rank Movers
Source-Benchmark Slices
Use these benchmark-derived slices when you want the next comparison to isolate one source family from the retrieved evidence. The benchmark name and metric badge on each slice show which rubric is represented.
Retrieved Sample Breakdown
Back to Search