Video Understanding

  • MOSE
    MOSE is a large-scale dataset for video object segmentation in complex scenes. It contains 2,149 video clips and 5,200 objects from 36 categories, with 431,725 high-quality object segmentation masks.
    Project Page

  • OVIS
    OVIS is a large-scale dataset for occluded video instance segmentation. It consists of 296k high-quality instance masks from 25 semantic categories, where heavy object occlusions usually occur.
    Project Page
    1st Occluded Video Instance Segmentation Challenge in ICCV 2021
    2nd Occluded Video Instance Segmentation Challenge in ECCV 2022

  • DanceTrack
    DanceTrack is a multi-human tracking dataset, emphasizing 1) uniform appearance: humans are in highly similar and almost undistinguished appearance, and 2) diverse motion: humans are in complicated motion pattern and their relative positions exchange frequently.
    Project Page
    1st Multiple People Tracking in Group Dance Challenge in ECCV 2022

  • MUSES
    MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. MUSES has 31,477 event instances for a total of 716 video hours. The core nature of MUSES is the frequent shot cuts, for an average of 19 shots per instance and 176 shots per video, which induces large intra-instance variations.
    Project Page

  • YouMVOS
    YouMVOs is a dataset for multi-shot video object segmentation, consisting of 431K segmentation masks and 200 YouTube videos.
    Project Page

OCR

  • WarpDoc
    WarpDoc is a warped document image dataset for document restoration. It consists of 1,020 camera images of documents that were collected from scientific papers, magazines, envelopes, etc., which have different paper materials, page layouts, and contents.
    Project Page

  • SCUT-CTW-Context and ReCTS-Context
    Two datasets are additionally annotated for a new task called Contextual Text Block Detection. The task aims to detect contextual text blocks which consist of one or multiple integral text units (e.g., characters, words, or phrases) in a natural reading order.
    Project Page