Occluded-DukeMTMC-VideoReID is a video person reID dataset on occluded scenes. It is constructed from the DukeMTMC-VideoReID  dataset. In the new dataset, all query tracklets are occluded by large variety of occlusion (e.g., trees, cars and other persons), while gallery set contains both holistic and occluded tracklets. It contains 1,702 tracklets covering 702 identities in total, the query set contains 661 tracklets of 661 identities and the gallery set contains 1975 tracklets of 1,110 identities. The figure below gives a few examples of the occluded tracklets.
VIPL-HR-V2 is the second version of VIPL-HR database for remote heart rate (HR) estimation from face videos under less-constrained situations, which contains 2500 RGB videos of 500 subjects recorded with RealSense F200 camera with resolution of 960 by 720. For each subject we cut five clips of ten-second long videos from a thirty-second long videos with a five-second stride. More details can be found in the Readme file in the package.
The WebTattoo dataset was built based on Images from Internet, which contains about 300K tattoo images which are supposed to be the distracter background images in large-scale tattoo retrieval tasks. We additionally provided the tattoo bounding boxes for the tattoo images from the public-domain flickr and demsi datasets. We also provided 300 tattoo sketches with their mated tattoo photos for sketch-based tattoo retrieval task. The figure below gives a few examples of the tattoos and tattoo sketches.
VIPL-HR database is a database for remote heart rate (HR) estimation from face videos under less-constrained situations. It contains 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos of 107 subjects. Nine different conditions, including various head movements and illumination conditions are taken into consideration.
FRHT dataset is established for studying full 360-degree out-of-plane rotation head single tracking. Overall, this dataset contains 50 sequences with 28,247 annotated bounding boxes of frames and expresses the diverse head movements in real-world conditions. FRHT is captured from Internet with a wide variety of scenes (e.g. street, sea, gymnasium, stage, grassland, ice rink) and activities (e.g. running, cycling, surfing, dancing, skating, flying). Naturally, it covers the most challenges of visual tracking problem annotated.
LRW-1000 is a naturally-distributed large-scale benchmark for word-level lipreading in the wild. There are 1000 classes with about 718,018 samples from more than 2000 individual speakers and more than 1,000,000 Chinese character instances in total. Each class corresponds to the syllables of a Mandarin word which is composed by one or several Chinese characters. This dataset aims to cover a natural variability over different speech modes and imaging conditions to incorporate challenges encountered in practical applications.
COX Face Dataset is designed for the problem of Video-to-Still (V2S)， Still-to-Video (S2V) and Video-to-Video (V2V) face recognition. The dataset contains 1,000 subjects, with each subject 1 high quality still image and 3 video sequences captured simulating video surveillance scenario. Specifically, in this dataset, the still images are collected under controlled environment, thus of high quality and resolution, in frontal view, with normal lighting and neutral expression. On the contrary, the video frames are of low resolution and low quality, with blur, and captured under poor lighting, in non-frontal view.
CFW-60K dataset is a purified subset of Celebrity Faces on the Web (CFW) with additional visual attribute annotations. The face images are associated with identity and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.
The DEVISIGN database has been constructed under the sponsors of Microsoft Research Asia by VIPL group, ICT, CAS. The goals to create the Chinese Sign Language database include: providing the wordwide researchers of SLR community a large vocabulary Chinese SL database for training and evaluating their algorithms; advancing the state-of-the-art SLR technologies aiming at practical applications especially for the unknown signer situation.
ICT-TV dataset is designed for studying video face retrieval problem, which contains two large scale video collections parsed from the whole first season of two hit American shows, i.e., 17 episodes of the Big Bang Theory (BBT) and 22 episodes of the Prison Break (PB).
ImageNet-150K dataset is a subset of ImageNet with additional visual attribute annotations. The images are associated with category and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on.