Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition

Weiqing Min, Linhu Liu, Zhengdong Luo, Shuqiang Jiang
(ACMMM 2019)


Recently, food recognition is gaining more attention in the multimedia community due to its various applications, e.g., multimodal foodlog and personalized healthcare. Most of existing methods directly extract visual features of the whole image using popular deep networks for food recognition without considering its own characteristics. Compared with other types of object images, food images generally do not exhibit distinctive spatial arrangement and common semantic patterns, and thus are very hard to capture discriminative information. In this work, we achieve food recognition by developing an Ingredient-Guided Cascaded Multi-Attention Network (IG-CMAN), which is capable of sequentially localizing multiple informative image regions with multi-scale from category-level to ingredient-level guidance in a coarse-to-fine manner. At the first level, IG-CMAN generates the initial attentional region from the category-supervised network with Spatial Transformer (ST). Taking this localized attentional region as the reference, IG-CMAN combined ST with LSTM to sequentially discover diverse attentional regions with fine-grained scales from ingredient-guided sub-network in the following levels. Furthermore, we introduce a new dataset WikiFood-200 with 200 food categories from the list in the Wikipedia, about 200,000 food images and 319 ingredients. We conduct extensive experiment on two popular food datasets and newly proposed WikiFood-200, demonstrating that our method achieves the state-of-the-art performance in Top-1 accuracy. Qualitative results along with visualization further show that IG-CMAN can introduce the explainability for localized regions, and is able to learn relevant regions for ingredients.

  • Weiqing Min, Linhu Liu, Zhengdong Luo, Shuqiang Jiang. Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. (ACM Multimedia 2019), 21-25 October 2019, Nice, France.