LRW-1000 is a naturally-distributed large-scale benchmark for word-level lipreading in the wild. There are 1000 classes with about 718,018 samples from more than 2000 individual speakers and more than 1,000,000 Chinese character instances in total. Each class corresponds to the syllables of a Mandarin word which is composed by one or several Chinese characters. This dataset aims to cover a natural variability over different speech modes and imaging conditions to incorporate challenges encountered in practical applications.
VIPL-HR database is a database for remote heart rate (HR) estimation from face videos under less-constrained situations. It contains 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos of 107 subjects. Nine different conditions, including various head movements and illumination conditions are taken into consideration.
COX Face Dataset is designed for the problem of Video-to-Still (V2S)， Still-to-Video (S2V) and Video-to-Video (V2V) face recognition. The dataset contains 1,000 subjects, with each subject 1 high quality still image and 3 video sequences captured simulating video surveillance scenario. Specifically, in this dataset, the still images are collected under controlled environment, thus of high quality and resolution, in frontal view, with normal lighting and neutral expression. On the contrary, the video frames are of low resolution and low quality, with blur, and captured under poor lighting, in non-frontal view.
CFW-60K dataset is a purified subset of Celebrity Faces on the Web (CFW) with additional visual attribute annotations. The face images are associated with identity and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.
The DEVISIGN database has been constructed under the sponsors of Microsoft Research Asia by VIPL group, ICT, CAS. The goals to create the Chinese Sign Language database include: providing the wordwide researchers of SLR community a large vocabulary Chinese SL database for training and evaluating their algorithms; advancing the state-of-the-art SLR technologies aiming at practical applications especially for the unknown signer situation.
FRHT dataset is established for studying full 360-degree out-of-plane rotation head single tracking. Overall, this dataset contains 50 sequences with 28,247 annotated bounding boxes of frames and expresses the diverse head movements in real-world conditions. FRHT is captured from Internet with a wide variety of scenes (e.g. street, sea, gymnasium, stage, grassland, ice rink) and activities (e.g. running, cycling, surfing, dancing, skating, flying). Naturally, it covers the most challenges of visual tracking problem annotated.
ICT-TV dataset is designed for studying video face retrieval problem, which contains two large scale video collections parsed from the whole first season of two hit American shows, i.e., 17 episodes of the Big Bang Theory (BBT) and 22 episodes of the Prison Break (PB).
ImageNet-150K dataset is a subset of ImageNet with additional visual attribute annotations. The images are associated with category and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.