Visual Information Processing and Learning
Visual Information Processing and Learning

Lip Reading

Leader:Shuang Yang / Shiguang Shan (Professor)

Email:shuang.yang [at] ict dot ac dot cn; sgshan [at] ict dot ac dot cn

* The Lip-Reading (LR) Group has been founded since 2017, which takes Lip Reading as the core task, with other auxiliary tasks such as Visual Voice Activity Detection, Visual Key-Word Spotting. The related technology not only can be used to assist speech recognition, to achieve a more intelligent and robust human-computer interaction, but also can be used independently in auxiliary teaching, security verification, military public security and other fields.

* News:

2019.4: ACM ICMI 2019-MAVSR competition starts! The competition was jointly organized by researchers from the Institute of Computing Technology (Chinese academy of sciences), Imperial College London, the university of Oxford and Samsung American Research Institute. For more details about the competition, please refer to MAVSR2019!

2018.10: The LR Group has released the large-scale naturally distributed lip reading dataset LRW-1000. This dataset is not only the currently largest word-level lip reading dataset, but also the only one public Mandarin lip reading dataset. For more details, please refer to the data pape.

2018.4~2018.10: The LR Group has been invited by CCTV-1 to show the lip reading technology and system to the whole television audiences. For more details, please click here.

* Research Topics:

1. Visual Speech Recognition (VSR) | Lip Reading (LR)

This topic mainly focuses on how to use and especially only use visual information to infer what the speaker is saying in the video (with or without sound). It can be used to help hearing-impaired people, and also play an important role for many audio-based speech recognition systems, especially in nosiy environment.

2. Talking face Generation

This task aims at making the given static face images “talk” given words, i.e. generating a video based only a clip of speech and the given face images of the target identity.

3. Visual Voice Activity Detection (VVAD)

This topic focuses on how to use visual information for speech activity detection, which is important for many practical speech recognition systems.

4. Multi-modal VSR/ KWS/ VVAD

* Related Applications:

※ Lip code/password, Liveness detection, Command statement recognition, and help adjust pronunciation in intelligent education systems, and so on.


Journal Papers

Conference Papers

1.    Shuang Yang, Yuanhang Zhang, Dalu Feng, Mingmin Yang, Chenhao Wang, Jingyun Xiao, Keyu Long, Shiguang Shan, Xilin Chen, "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild," IEEE FG 2019 (Oral)

Visual Information Processing and Learning
  • Address :No.6 Kexueyuan South Road
  • Zhongguancun,Haidian District
  • Beijing,China
  • Postcode :100190
  • Tel : (8610)62600514
  • Valse

  • Big Lecture of DL

Copyright @ Visual Information Processing and Learning 京ICP备05002829号 京公网安备1101080060