Few-Shot Food Recognition via Multi-View Representation Learning

Shuqiang Jiang, Weiqing Min, Yongqiang Lyu, Linhu Liu
(ACM Transactions on Multimedia Computing, Communications and Applications 2020)
[PDF]

食品类别多样,从现实世界收集的食品数据集符合典型的长尾分布,许多不常见食品类别只能收集到少量样本。相比于一般图像的小样本识别,食品图像的小样本识别更具实际意义。 本文在项目组前期食品计算(Food Computing:[Min2019-ACM CSUR])与食品识别(Food Recognition:[Jiang2020-IEEE TIP][Min2019-ACMMM][Xu2015-IEEE TMM])的研究基础上,研究了小样本食品图像识别问题,提出了融合食品成分与类别信息的多视表示方法,并在多个数据集上进行了实验分析与验证。

  • [Min2019-ACM CSUR] Weiqing Min,Shuqiang Jiang, Linhu Liu,Yong Rui, Ramesh Jain A Survey on Food Computing. ACM Computing Surveys, 52(5), 92:1-92:36, 2019
  • [Jiang2020-IEEE TIP] Shuqiang Jiang, Weiqing Min, Linhu Liu, Zhengdong Luo, Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition. IEEE Trans. Image Processing, vol.29, pp.265-276, 2020
  • [Min2019-ACMMM] Weiqing Min, Linhu Liu, Zhengdong Luo, Shuqiang Jiang, Ingredient-Guided Cascaded Multi-Attention Network for Food Recognition. (ACM Multimedia 2019), 21-25 October 2019, Nice, France
  • [Xu2015-IEEE TMM] Ruihan Xu, Luis Herranz, Shuqiang Jiang, Shuang Wang, Xinhang Song, Ramesh Jain, Geolocalized Modeling for Dish Recognition. IEEE Trans. Multimedia 17(8): 1187-1199, 2015
  • Abstract

    This paper considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of labelled samples, and fail to recognize food categories with few samples. To address this problem, we propose a Multi-View Few-Shot Learning (MVFSL) framework to explore additional ingredient information for few-shot food recognition. Besides category-oriented deep visual features, we introduce ingredient-supervised deep network to extract ingredient-oriented features. As general and intermediate attributes of food, ingredient-oriented features are informative and complementary to category-oriented features, and thus play an important role in improving food recognition. Particularly in few-shot food recognition, ingredient information can bridge the gap between disjoint training categories and test categories. In order to take advantage of ingredient information, we fuse these two kinds of features by first combining their feature maps from their respective deep networks, and then convolving combined feature maps. Such convolution is further incorporated into a multi-view relation network, which is capable of comparing pairwise images to enable fine-grained feature learning. MVFSL is trained in an end-to-end fashion for joint optimization on two types of feature learning subnetworks and relation subnetworks. Extensive experiments on different food datasets have consistently demonstrated the advantage of MVFSL in multi-view feature fusion. Furthermore, we extend another two types of networks, namely Siamese Network and Matching Network by introducing ingredient information for few-shot food recognition. Experimental results have also shown that introducing ingredient information into these two networks can improve the performance of few-shot food recognition.


    • Shuqiang Jiang, Weiqing Min, Yongqiang Lyu, Linhu Liu. Few-Shot Food Recognition via Multi-View Representation. ACM Transactions on Multimedia Computing, Communications and Applications (2020, Accepted)