Data
ORena SAVE FOCUS training data is released in two batches and covers the full progression from FRAME to long-horizon PROCEDURE reasoning. VQA pairs are generated and quality-controlled by expert annotators, with access provided through Hugging Face and the ORena SAVE FOCUS Python package.
All Training Data
The complete ORena SAVE FOCUS training data will consist of two batches. The data will be hosted via Hugging Face and can be accessed conveniently through the ORena SAVE FOCUS Python package.
First Training Data Batch
The first training data batch is based on the HeiCo-FOCUS dataset, built on Heidelberg colorectal surgeries. HeiCo-FOCUS contains 30 colorectal surgery videos and 15,000 VQA pairs covering five core capability groups: object recognition and identity matching, temporal grounding, aggregation, event and procedural understanding, and complex reasoning.
For the PROCEDURE Track, the first training data batch contains 3,000 long-context VQA pairs. These questions focus on reasoning over extended surgical video context, including persistent foreign object tracking, temporal grounding, aggregation over time, event and procedural understanding, retrieval-status reasoning, and complex reasoning across objects, events, and time.
HeiCo-FOCUS was constructed through a rigorous multi-stage annotation pipeline involving large-scale annotation and 39 domain experts to ensure high quality and clinical relevance. Extensive experiments show that current VLMs are still challenged strongly, especially on long-horizon PROCEDURE tasks.
Second Data Batch
The second training data batch will consist of 170 videos of laparoscopic cholecystectomies. These videos have not been made public before and therefore represent new challenge data.
FRAME
- 14,000 VQA pairs
SEGMENT
- 14,000 VQA pairs
PROCEDURE
- 7,000 VQA pairs
Together with the first training data batch, this results in 50,000 training VQA pairs.
Validation and Test Data
Test data will include 200 videos from a broad range of procedures, including cholecystectomies and additional procedure types. These test videos will not be conveyed to participants. Leaderboard validation data will include 20 additional videos representative of the test data.
Taxonomy
The taxonomy defines the capability groups and sub-capabilities used to categorize VQA pairs across the ORena SAVE FOCUS Challenge. The FRAME Track focuses on sub-capabilities visible in a single image, while SEGMENT and PROCEDURE additionally cover temporal and long-context reasoning.