Convergent Validity Diagnosis

Compare whether different benchmark operationalizations produce stable model rankings.

Comparison setup collapsed

The ranking diagnosis is shown below.

Existing Benchmark Ranking Sets

Select 2-4 benchmark ranking sets to compare model ordering across established evaluations.

Retrieved Sample Ranking Sets

Create retrieved-sample slices, optionally add existing benchmarks, and compare the induced model rankings.

Manual Slice

Computing model rankings and rank-divergence diagnosis...

0% rank-divergence risk

Top Rank Movers

Source-Benchmark Slices

Use these benchmark-derived slices when you want the next comparison to isolate one source family from the retrieved evidence. The benchmark name and metric badge on each slice show which rubric is represented.

Retrieved Sample Breakdown

Back to Search