VLM Comparative Benchmark Visualizer

Select a dataset to load evaluation samples. The interface will display the same question/task evaluated across four different VLMs.

1. Select a Dataset

2. Select a Sample / Episode Step