LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild
CFW-60K dataset is a purified subset of Celebrity Faces on the Web (CFW) with additional visual attribute annotations. The face images are associated with identity and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.
COX Face Dataset is designed for the problem of Video-to-Still (V2S)， Still-to-Video (S2V) and Video-to-Video (V2V) face recognition. The dataset contains 1,000 subjects, with each subject 1 high quality still image and 3 video sequences captured simulating video surveillance scenario. Specifically, in this dataset, the still images are collected under controlled environment, thus of high quality and resolution, in frontal view, with normal lighting and neutral expression. On the contrary, the video frames are of low resolution and low quality, with blur, and captured under poor lighting, in non-frontal view.
The DEVISIGN database has been constructed under the sponsors of Microsoft Research Asia by VIPL group, ICT, CAS. The goals to create the Chinese Sign Language database include: providing the wordwide researchers of SLR community a large vocabulary Chinese SL database for training and evaluating their algorithms; advancing the state-of-the-art SLR technologies aiming at practical applications especially for the unknown signer situation.
FRHT dataset is established for studying full 360-degree out-of-plane rotation head single tracking. Overall, this dataset contains 50 sequences with 28,247 annotated bounding boxes of frames and expresses the diverse head movements in real-world conditions. FRHT is captured from Internet with a wide variety of scenes (e.g. street, sea, gymnasium, stage, grassland, ice rink) and activities (e.g. running, cycling, surfing, dancing, skating, flying). Naturally, it covers the most challenges of visual tracking problem annotated.
ICT-TV dataset is designed for studying video face retrieval problem, which contains two large scale video collections parsed from the whole first season of two hit American shows, i.e., 17 episodes of the Big Bang Theory (BBT) and 22 episodes of the Prison Break (PB).
ImageNet-150K dataset is a subset of ImageNet with additional visual attribute annotations. The images are associated with category and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.