Data Description - Surgical Visual Understanding

Data Description:¶

This challenge will use an expanded version of the training data and labels used within the 2022/3 SurgToolLoc challenge, with the major addition of surgical task labels for category 2 of this year’s SurgVU challenge. The test set will also be an expanded version of those used within the 2 years of the SurgToolLoc challenges. Specific details regarding the data are given below.

Training Data:¶

The dataset consists of videos taken from surgical training exercises using the da Vinci robotic system
During these exercises, surgical trainees perform standard activities such as dissecting tissue, suturing, sealing vessels, and so on. 
There are 280 long videos (from 155 training sessions) captured at 60 fps with a resolution of 720p (1280 x 720) from one channel of the endoscope (see example screenshots below). This translates to over 840 hours of surgical tasks being performed resulting in over 18 million frames.
For the extent of each clip, there are three robotic surgical tools installed and within the surgical field, although for some clips, tools may be obscured or otherwise temporarily not visible
Each clip can contain up to three of 12 possible tools

Note: There are a few tools that occur rarely but are still kept within the training set. This means that there can be tools beyond the 12 mentioned here that are part of the training set. However, these tools will not be part of the testing set and the submissions will not need to recognize those. The teams are free to use the extra tool labels in training set however they wish.

Training labels:¶

For each training session within the training set, we provide "tool presence" labels, and task labels. Tool presence labels indicate which robotic tools are installed in each frame. Please note the tool presence labels are noisy (see below). Task labels indicate the start and stop annotation for each surgical task in the video.

Labels are found in two CSV files: tools.csv, which contains the start and stop times for every tool install, and tasks.csv, which contains the start and stop times for tasks. Note that due to their length some sessions are made of two or more video parts. The columns "install_case_part"/"uninstall_case_part" in tools.csv and "start_part"/"stop_part" in tasks.csv indicate the video parts for each label.

Tool Presence labels:

A snapshot of the tools.csv files within each session is shown below. There are 2 columns that name the tools. For this challenge, the primary label is "groundtruth_toolname". The column titled "commercial_toolname" is the commercial name for the surgical instruments. As you can see in the csv file, there can be multiple different commercial_toolname entries corresponding to the same groundtruth_toolname. Multiple tools may look slightly different but, they serve the same purpose. The label groundtruth_toolname groups tools according to this basic tool purpose. The additional commercial_toolname labels are provided for context and the teams are free to use them in any way they would like. **But the test labels will only be based on groundtruth_toolname column. **

The four arms are usually (though not necessarily) installed from left to right (i.e. USM1 is the leftmost arm, and USM4 is the rightmost arm)
Please note that there are cases where the label might indicate 3 tools present, but only 2 or fewer tools can be seen in the video. This happens when the surgeon has moved the tool away from the view even though it is installed. This noise in training labels is introduced by extracting tool information from robot system data directly.
Sample frames for each tool:

Task labels:

Each video is segmented into a series of mutually exclusive surgical "tasks." These tasks indicate the general aim of the surgeon's activity (e.g. suturing).
There are a total of 8 task labels within the dataset. Brief description of each task is given below:

"Suturing" : These steps include the process of suturing with the robotic system. This category contains basic skills/techniques and general application to clinical scenarios.

"Uterine Horn" : This step involves demonstration and practice with proper retraction tension application to tissue paired with the different dissection techniques of the monoploar curved scissors. The dissection focuses on using cut and coagulation energy dissection contrasted with using the scissors, along the uterine horn connective tissue of the porcine model.

"Suspensory Ligaments" : These steps involve bunt dissection of soft tissue at different depths of abdomen. The focus is on the development of skill and proper camera movement/zooming during dissection.

"Rectal Artery/Vein" : These steps involve the careful isolation of vessels from larger organ structures. This demonstrates proper applications of advanced tools and imaging functions.

"Skills Application" : These are large-scales steps focused on combining the taught skills in a clinically applicable scenario. These actions are relatively unstructured and variable.

"Range of Motion" : This step involves the proper navigation of the endoscope and tools around body cavity to avoid collisions. It demonstrate the full range of view/workspace with proper port placement.

"Retraction and Collision Avoidance" : This step involves navigation of tools through a stationary field of view and how to avoid tool collisions. Quadrants are set up and cycles of movement through these quadrants are performed (cycles are with or without bladder retraction).

"Other" : Activity outside of the structured training.

NOTE: Any unannotated parts in the videos are to be treated as "Other" label

Below is a snapshot of the tasks.csv file:

Testing Data:¶

The testing dataset will also consist of videos taken from surgical training exercises (similar to the training set) using the da Vinci robotic system
The length of each video will be variable
The videos will be sampled at 1Hz (1 fps)

Testing labels:¶

The test set will be annotated for bounding boxes around the robotic tools and surgical steps. The tool bounding box annotations are generated with an experienced crowd of annotators, while the surgical steps are annotated by multiple domain knowledge experts.

A few examples of bounding box annotations from the test set are shown below:

It is important to note that the clevis of each robotic surgical tool is considered as the ground truth for most tools (as shown above). However, there are following exceptions in this rule:

If a tool's clevis is not well defined, then the bounding box includes the surgical tip as well e.g monocular curve scissor as shown in the left image above (purple colored bounding box).
If the tool is very large and the clevis does not appear in the field of view (e.g  tip-up fenestrated grasper, suction irrigator, stapler, grasping retractor).

Moreover, using the information available in the UI to make predictions is not allowed. To enforce this, the UI will be blurred from the test set eliminating this information. An example image taken from the test set with blurred a UI is given below:

Additional bounding box examples: