Loading Resources
Setting up the system...

Content Validity Diagnosis

Compare 2-4 facets of the same broad capability and inspect whether they are supported by distinct or overlapping benchmark evidence.

Facet Builder
Use cached searches or add practitioner-specified facets directly. Uncached facets run the full BenchBrowser retrieval pipeline.
Cached Facets
Add Custom Facet
Facets to Compare
Running retrieval and overlap analysis for the selected facets. Cached facets are reused automatically.
Facet Retrieval Progress
Each facet is checked against the session cache first. Uncached facets run testcase generation, embedding search, and sample scoring before overlap risk is computed.
0% content-validity risk
Overlap-based risk
Benchmarks contributing most shared evidence
Back to Search