VIPL's paper on person re-identification is accepted by IEEE TIP

Time: Mar 15, 2021


  Recently, one paper on language-based person search is accepted by the journal IEEE TIP. The full name of IEEE TIP is Transactions on Image Processing, which is an international journal on computer vision and image processing with an impact factor of 6.79 announced in 2020. The paper information is as follow:

  Yucheng Chen, Rui Huang, Hong Chang, Chuanqi Tan, Tao Xue and Bingpeng Ma. “Cross-Modal Knowledge Adaptation for Language-Based Person Search”, IEEE Transactions on Image Processing (TIP), 2021. (Accepted)

  Language-based person search is a challenging retrieval task. The inconsistent representation of different modalities makes it difficult to directly measure the similarity between visual images and textual descriptions. The common representation learning methods have achieved certain success on this task. Most of the common representation learning methods project image features and text features into a shared space in an equal manner. However, the information contained in an image and a text is not equal. Since text provides a description of the person in an image, it summarizes partial image information. In other words, images contain image-specific information that is rarely described by texts. Unequal amount of information of image and text will result in redundancy of image information and difficulty in aligning features across different modalities. Image-specific information may also be detrimental to the learning of image representation. Considering that text can be used to guide image features to enrich them with important person details while avoiding the interference of image-specific information, in this paper, we propose a method named Cross-Modal Knowledge Adaptation (CMKA). Specially, text-to-image guidance is obtained at different levels: individuals, lists, and classes. By combining these levels of knowledge adaptation, the image-specific information is suppressed, and the common space of image and text is better constructed. The overall framework of CMKA is as follows:



  In summary, the main contributions of this work include: 1) A cross-modal knowledge adaptation method is proposed. By combining different levels of knowledge adaptation, the information between modalities is balanced and more textual-visual correspondences are learned; 2) The effectiveness of CMKA is verified on the language-based person search dataset; 3) The effectiveness of CMKA is verified on the different retrieval tasks (including image-text bi-directional retrieval and image-to-text re-ID).


Download: