(a) Face Samples in LRW-1000
(b) Lip Samples in LRW-1000
LRW-1000 is a naturally-distributed large-scale benchmark for word-level lipreading in the wild, including 1000 classes with about 745,187 video samples from more than 2000 individual speakers. Each class corresponds to the syllables of a Mandarin word which is composed of one or several Chinese characters. This dataset aims to cover a natural variability over different speech modes and imaging conditions to incorporate challenges encountered in practical applications. It shows a large variation over several aspects, including the number of samples in each class, resolution of videos, lighting conditions, and speakers’ attributes such as pose, age, gender, and make-up.
3. Evaluation Protocols
We provide two evaluation metrics for experiments. The recognition accuracy over all 1000 classes is naturally considered as the base metric, since this is a classification task. Meanwhile, motivated by the large diversity the data shows in many aspects, such as the number of samples in each class, we also provide the Kappa Coefficient as a second evaluation metric.
Shuang Yang, Yuanhang Zhang, Dalu Feng, Mingmin Yang, Chenhao Wang, Jingyun Xiao, Keyu Long, Shiguang Shan, Xilin Chen, LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild, arxiv 2018. （https://arxiv.org/pdf/1810.06990.pdf）
5. Contact Info
Dalu Feng (firstname.lastname@example.org), Institute of Computing Technology, Chinese Academy of Sciences
Shuang Yang (email@example.com), Institute of Computing Technology, Chinese Academy of Sciences