ORena FOCUS Challenge

Data

ORena SAVE FOCUS training data is released in two batches and covers the full progression from FRAME to long-horizon PROCEDURE reasoning. VQA pairs are generated and quality-controlled by expert annotators, with access provided through Hugging Face and the ORena SAVE FOCUS Python package.

All Training Data

The complete ORena SAVE FOCUS training data will consist of two batches. The data will be hosted via Hugging Face and can be accessed conveniently through the ORena SAVE FOCUS Python package.

First Batch
30
colorectal surgery videos
Second Batch
170
laparoscopic cholecystectomy videos
Total Videos
200
training videos
Total VQA Pairs
50,000
questions

First Training Data Batch

The first training data batch is based on the HeiCo-FOCUS dataset, built on Heidelberg colorectal surgeries. HeiCo-FOCUS contains 30 colorectal surgery videos and 15,000 VQA pairs covering five core capability groups: object recognition and identity matching, temporal grounding, aggregation, event and procedural understanding, and complex reasoning.

For the PROCEDURE Track, the first training data batch contains 3,000 long-context VQA pairs. These questions focus on reasoning over extended surgical video context, including persistent foreign object tracking, temporal grounding, aggregation over time, event and procedural understanding, retrieval-status reasoning, and complex reasoning across objects, events, and time.

HeiCo-FOCUS was constructed through a rigorous multi-stage annotation pipeline involving large-scale annotation and 39 domain experts to ensure high quality and clinical relevance. Extensive experiments show that current VLMs are still challenged strongly, especially on long-horizon PROCEDURE tasks.

Overview of the HeiCo-FOCUS benchmark
Figure 1: Overview of the HeiCo-FOCUS benchmark. Clinical motivation and dataset overview, including 96 hours of surgical video annotated with 15,000 VQA pairs spanning five core capabilities.

Second Data Batch

The second training data batch will consist of 170 videos of laparoscopic cholecystectomies. These videos have not been made public before and therefore represent new challenge data.

FRAME

  • 14,000 VQA pairs

SEGMENT

  • 14,000 VQA pairs

PROCEDURE

  • 7,000 VQA pairs

Together with the first training data batch, this results in 50,000 training VQA pairs.

Validation and Test Data

Test data will include 200 videos from a broad range of procedures, including cholecystectomies and additional procedure types. These test videos will not be conveyed to participants. Leaderboard validation data will include 20 additional videos representative of the test data.

Taxonomy

The taxonomy defines the capability groups and sub-capabilities used to categorize VQA pairs across the ORena SAVE FOCUS Challenge. The FRAME Track focuses on sub-capabilities visible in a single image, while SEGMENT and PROCEDURE additionally cover temporal and long-context reasoning.

HeiCo-FOCUS taxonomy with sample questions
Figure 2: HeiCo-FOCUS taxonomy with sample questions.