VIPL's paper on food category-ingredient prediction is accepted by IEEE TIP

Time: Jul 27, 2022

Congratulations! VIPL’ one research paper “Ingredient-Guided Region Discovery and Relationship Modeling for Food Category-Ingredient Prediction” is accepted by IEEE TIP. The full name of IEEE TIP is IEEE Transactions on Image Processing. It is an international journal on computer vision and image processing.

Ingredient-Guided Region Discovery and Relationship Modeling for Food Category-Ingredient Prediction (Wang Zhiling, Min Weiqing, Li Zhuo, Kang Liping, Wei Xiaoming, Wei Xiaolin, Jiang Shuqiang)

Food is composed of changing and complex ingredients, and finding the ingredients visual regions can better help us identify its categories. In addition, the relationship between ingredients is also very significant, such as co-occurrence relationship, mutual exclusion relationship, etc. Therefore, we propose a multi-task food category-ingredient joint learning framework, which can realize simultaneous food category recognition and ingredient prediction. The proposed framework mainly consists of two parts, namely Ingredient-oriented Visual Region Discovery (IVRD) and Ingredient-oriented Graph Relationship Learning (IGRL). In IVRD, we propose to group 2D feature maps into specific food regions through a pre-built food dictionary. In this process, we apply a U-shaped prior to regularize the occurrence frequency of ingredients, thereby improving the ability of ingredient discovery in the process. After that, we pool these ingredient features from the ingredient assignment map and use an attention mechanism to weight them. In IGRL, we construct an ingredient-oriented semantic-visual graph to explore the relationship between various ingredients, in which we use all ingredient visual representations as nodes, and the semantic similarity between ingredients words is used as edges to construct an ingredient graph. The graph convolutional network is further used to model and learn the relationship between ingredients, which can realize the interaction between text embedding and visual features. Finally, for the outputs of the two branches, we fuse them together and feed them into the classifier, and optimize the entire network model by means of multi-task learning for food category recognition and ingredient prediction. Finally, extensive experiments on three popular benchmark datasets (ETH Food-101, Vireo Food-172 and ISIA Food-200) demonstrate the effectiveness of our proposed method. Further visualization of the ingredient assignment map and attention map demonstrates the superiority of our method.

Fig.1. The proposed ingredient-oriented food category-ingredient joint learning framework


Download: