ORena FOCUS Challenge

Data

ORena SAVE FOCUS training data is released in two batches and covers the full progression from FRAME to long-horizon PROCEDURE reasoning. VQA pairs are generated and quality-controlled by expert annotators, with access provided through Hugging Face and the ORena SAVE FOCUS Python package.

All Training Data

The ORena SAVE FOCUS training data covers two underlying datasets and is published in two releases, hosted on Hugging Face: HeiCo-FOCUS VQA and LapChole-FOCUS VQA. Both datasets can also be accessed conveniently through the ORena SAVE FOCUS Python package.

First Release

15,000

VQA pairs on HeiCo-FOCUS

Second Release

35,000

VQA pairs across both datasets

Total Videos

200

30 HeiCo + 170 LapChole

Total VQA Pairs

50,000

across all tracks and datasets

First Data Release

The first data release is based on the HeiCo dataset (Maier-Hein et al., 2021), consisting of 30 fully labeled Heidelberg colorectal surgery videos. It contains 15,000 VQA pairs covering five core capability groups: object recognition and identity matching, temporal grounding, aggregation, event and procedural understanding, and complex reasoning.

For the PROCEDURE Track, questions focus on reasoning over extended surgical video context, including persistent foreign object tracking, temporal grounding, aggregation over time, retrieval-status reasoning, and complex reasoning across objects, events, and time.

HeiCo-FOCUS was constructed through a rigorous multi-stage annotation pipeline involving large-scale annotation and 39 domain experts to ensure high quality and clinical relevance. Extensive experiments show that current SOTA VLMs are still challenged strongly, especially on long-horizon PROCEDURE tasks.

FRAME

6,000 VQA pairs

SEGMENT

6,000 VQA pairs

PROCEDURE

3,000 VQA pairs

Figure 1: Overview of the HeiCo-FOCUS benchmark. Clinical motivation and dataset overview, including 96 hours of surgical video. Note: VQA pair counts shown reflect the first data release only.

Second Data Release

The second data release introduces the LapChole-FOCUS dataset — 170 laparoscopic cholecystectomy videos that have not been made public before — and additionally extends HeiCo-FOCUS with a further 15,000 questions following the same track distribution as the first release. Together, this release adds 35,000 VQA pairs, bringing the total to 50,000.

HeiCo-FOCUS Extension

An additional 15,000 VQA pairs on the 30 HeiCo colorectal surgery videos, following the same capability taxonomy as the first release.

FRAME

6,000 VQA pairs

SEGMENT

6,000 VQA pairs

PROCEDURE

3,000 VQA pairs

LapChole-FOCUS

20,000 VQA pairs on 170 laparoscopic cholecystectomy videos. Of these, 100 videos are fully annotated; the remaining 70 are unlabeled and provided as additional training material.

FRAME

8,000 VQA pairs

SEGMENT

8,000 VQA pairs

PROCEDURE

4,000 VQA pairs

Validation and Test Data

Test data will include 200 videos from a broad range of procedures, including cholecystectomies and additional procedure types. These test videos will not be conveyed to participants. Leaderboard validation data will include 20 additional videos representative of the test data.

Validation Set

FRAME

2,000 VQA pairs

SEGMENT

2,000 VQA pairs

PROCEDURE

1,000 VQA pairs

Test Set

FRAME

20,000 VQA pairs

SEGMENT

20,000 VQA pairs

PROCEDURE

10,000 VQA pairs

Taxonomy

The taxonomy defines the capability groups and sub-capabilities used to categorize VQA pairs across the ORena SAVE FOCUS Challenge. The FRAME Track focuses on sub-capabilities visible in a single image, while SEGMENT and PROCEDURE additionally cover temporal and long-context reasoning.

Figure 2: Shared taxonomy of capabilities across both datasets, with sample questions.