Shuqiang Jiang's homepage
Shuqiang Jiang
Ph.D
Tel:
010-62600505
Email:
sqjiang@ict.ac.cn
Address:
No.6 Kexueyuan South Road Zhongguancun,Haidian District Beijing,China The Institute of Computing Technology of the Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing 100190
Personal profile

He is a professor with the Institute of Computing Technology, Chinese Academy of Sciences(CAS), and a professor in University of CAS. He is also with the Key Laboratory of Intelligent Information Processing, CAS. His research interests include multimedia content analysis and retrieval, image/video understanding, and multimodal intelligence. He has authored or coauthored more than 150 papers. He was supported by National Science Fund for Distinguished Young Scholars in 2021, NSFC Excellent Young Scientists Fund in 2013, Young top-notch talent of Ten Thousand Talent Program in 2014. He is the senior member of IEEE and CCF, member of ACM, Associate Editor of IEEE Multimedia, Multimedia Tools and Applications. He is the vice chair of IEEE CASS Beijing Chapter and vice chair of ACM SIGMM China chapter.

Publication
MORE
  • Xinda Liu, Weiqing Min, Shuhuan Mei, Lili Wang, Shuqiang Jiang, Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach.

    每年,全球高达40%的粮食作物因病虫害而遭受损失。这导致每年的农业贸易损失超过2200亿美元,数以百万计的人们陷入饥饿。近年来图像处理技术特别是深度学习的发展为病虫害识别提供了新的可行方案,但由于缺乏系统分析和足够的数据量支撑,研究并没有形成应有的规模。本文系统分析了在计算机视觉视角下的病害识别问题面临的挑战,收集了一个包含271类病害、超过22万张图像的数据集,提出了重新加权视觉区域和损失以强调患病部位的方法。该方法从图像和特征两个层次对强化患病位置的影响力,同时通过拆分和重组兼顾全局和局部信息。我们在提出的数据集和另一个公共数据集上进行了大量评估证明该方法的优势。我们希望这项研究将进一步推动图像处理领域中植物病害识别的进程。

    Abstract

    Plant disease diagnosis is very critical for agriculture due to its importance for increasing crop production. Recent advances in image processing offer us a new way to solve this issue via visual plant disease analysis. However, there are few works in this area, not to mention systematic researches. In this paper, we systematically investigate the problem of visual plant disease recognition for plant disease diagnosis. Compared with other types of images, plant disease images generally exhibit randomly distributed lesions, diverse symptoms and complex backgrounds, and thus are hard to capture discriminative information. To facilitate the plant disease recognition research, we construct a new large-scale plant disease dataset with 271 plant disease categories and 220,592 images. Based on this dataset, we tackle plant disease recognition via reweighting both visual regions and loss to emphasize diseased parts. We first compute the weights of all the divided patches from each image based on the cluster distribution of these patches to indicate the discriminative level of each patch. Then we allocate the weight to each loss for each patch-label pair during weakly-supervised training to enable discriminative disease part learning. We finally extract patch features from the network trained with loss reweighting, and utilize the LSTM network to encode the weighed patch feature sequence into a comprehensive feature representation. Extensive evaluations on this dataset and another public dataset demonstrate the advantage of the proposed method. We expect this research will further the agenda of plant disease recognition in the community of image processing.


    • Xinda Liu, Weiqing Min, Shuhuan Mei, Lili Wang, Shuqiang Jiang. “Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach”, IEEE Transactions on Image Processing (TIP), 2021.

    IEEE Trans. Image Processing (2020, Accepted)
    [PDF]
  • Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jian, Amorphous Region Context Modeling for Scene Recognition.

    场景图像通常由前景和背景区域内容组成。一些现有的方法提出使用密集网格来提取区域内容。这样的网格可以将物体分成几个离散的部分,使得区域块中的语义含义并不明确。同时,物体性的方法可能只关注场景图像中的前景内容,导致背景内容和空间结构不完整。与现有方法相比,本文提出了一种解决语义模糊的方法,即检测区域内容本身的边界,并通过语义分割技术精确定位区域内容的无定形轮廓。此外,在构建场景表示时,我们引入了图像中完整的前景和背景信息。通过图神经网络对这些区域建模,探索了区域之间的上下文关系,得到用于场景识别的具有区分性的场景特征表示。在MIT67和SUN397上的实验结果证明了提出方法的有效性和泛化性。

    Abstract

    Scene images are usually composed of foreground and background regional contents. Some existing methods propose to extract regional contents with dense grids or objectness region proposals. However, dense grids may split the object into several discrete parts, learning semantic ambiguity for the patches. The objectness methods may focus on particular objects but only pay attention to the foreground contents and do not exploit the background that is key to scene recognition. In contrast, we propose a novel scene recognition framework with amorphous region detection and context modeling. In the proposed framework, discriminative regions are first detected with amorphous contours that can tightly surround the targets through semantic segmentation techniques. In addition, both foreground and background regions are jointly embedded to obtain the scene representations with the graph model. Based on the graph modeling module, we explore the contextual relations between the regions in geometric and morphology aspects, and generate the discriminative representations for scene recognition. Experimental results on MIT67 and SUN397 demonstrate the effectiveness and generality of the proposed method.


    • Haitao Zeng, Xinhang Song, Gongwei Chen, Shuqiang Jiang. “Amorphous Region Context Modeling for Scene Recognition”, IEEE Transactions on Multimedia (TMM), 2020.(Accepted December 7, 2020)


    IEEE Trans. Multimedia (2020, Accepted)
    [PDF]
  • Yanchao Zhang, Weiqing Min, Liqiang Nie, Shuqiang Jiang, Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction.

    随着社交媒体平台的发展,如Facebook和Vine,越来越多的用户喜欢在这些平台上分享自己的日常生活,而移动设备的普及,促进了海量多媒体数据的产生。用户在分享日常的时候,考虑到隐私问题,往往是没有地理位置标注的,这就限制了地点识别和推荐系统的发展。随着多媒体数据的不断增加以及人工智能领域特别是深度学习的发展,视频场所识别任务应运而生。该任务以视频作为输入,判别该视频所发生的场所,在个性化餐馆推荐、用户隐私检测等方面有着广泛的应用前景。本文在项目组前期工作(Video Venue Prediction: [Jiang2018-IEEE TMM])的研究基础上,提出了一个新的网络模型HA-TSFN。该模型考虑了全局信息和局部信息,并使用全局-局部注意力机制来捕获视频中的场景和物体信息,从而增强视觉信息的表达。同时,在大规模视频场所数据集Vine上进行了实验分析和验证。

    • [Jiang2018-IEEE TMM] Shuqiang Jiang, Weiqing Min Shuhuan Mei, “Hierarchy-dependent cross-platform multi-view feature learning for venue category prediction,” IEEE Transactions on Multimedia, vol. 21, no. 6, pp. 1609–1619, 2018

    Abstract

    Video venue category prediction has been drawing more attention in the multimedia community for various applications such as personalized location recommendation and video verification. Most of existing works resort to the information from either multiple modalities or other platforms for strengthening video representations. However, noisy acoustic information, sparse textual descriptions and incompatible cross-platform data could limit the performance gain and reduce the universality of the model. Therefore, we focus on discriminative visual feature extraction from videos by introducing a hybrid-attention structure. Particularly, we propose a novel Global-Local Attention Module (GLAM), which can be inserted to neural networks to generate enhanced visual features from video content. In GLAM, the Global Attention (GA) is used to catch contextual scene-oriented information via assigning channels with various weights while the Local Attention (LA) is employed to learn salient object-oriented features via allocating different weights for spatial regions. Moreover, GLAM can be extended to ones with multiple GAs and LAs for further visual enhancement. These two types of features respectively captured by GAs and LAs are integrated via convolution layers, and then delivered into convolutional Long Short-Term Memory (convLSTM) to generate spatial-temporal representations, constituting the content stream. In addition, video motions are explored to learn long-term movement variations, which also contributes to video venue prediction. The content and motion stream constitute our proposed Hybrid-Attention Enhanced Two-Stream Fusion Network (HA-TSFN). HA-TSFN finally merges the features from two streams for comprehensive representations. Extensive experiments demonstrate that our method achieves the state-of-the-art performance in the large-scale dataset Vine. The visualization also shows that the proposed GLAM can capture complementary scene-oriented and object-oriented visual features from videos.


    • Yanchao Zhang, Weiqing Min, Liqiang Nie, Shuqiang Jiang. “Hybrid-Attention Enhanced Two-Stream Fusion Network for Video Venue Prediction”, IEEE Transactions on Multimedia (TMM), 2020.


    IEEE Trans. Multimedia (2020, Accepted)
    [PDF]
  • Yaohui Zhu, Weiqing Min, Shuqiang Jiang, Attribute-Guided Feature Learning for Few-Shot Image Recognition.

    我们提出了一种属性指导的两层学习框架,该框架能够获得通用的特征表示。属性学习被作为小样本图像识别在多任务学习框架下的另一学习目标。在该框架下,小样本图像识别在任务层面学习和属性学习在图像上进行,他们共享同一网络。此外,在属性学习的指导下,来自不同层次的特征是不同级别的属性表示,它们能在多个方面进行小样本图像识别。因此,本文建立了一种以属性为指导的两层学习机制,以捕获更多判别性表示。与单层学习机制相比,两层学习机制获得的是互补表示。所提出的框架与特定的模型无关。两种典型的方法:基于度量的小样本方法和元学习方法都能插入到提出的框架中。

    Abstract

    Few-shot image recognition has become an essential problem in the field of machine learning and image recognition, and has attracted more and more research attention. Typically, most few-shot image recognition methods are trained across tasks. However, these methods are apt to learn an embedding network for discriminative representations of training categories, and thus could not distinguish well for novel categories. To establish connections between training and novel categories, we use attribute-related representations for few-shot image recognition and propose an attribute-guided two-layer learning framework, which is capable of learning general feature representations. Specifically, few-shot image recognition trained over tasks and attribute learning trained over images share the same network in a multi-task learning framework. In this way, few-shot image recognition learns feature representations guided by attributes, and is thus less sensitive to novel categories compared with feature representations only using category supervision. Meanwhile, the multi-layer features associated with attributes are aligned with category learning on multiple levels respectively. Therefore we establish a two-layer learning mechanism guided by attributes to capture more discriminative representations, which are complementary compared with a single-layer learning mechanism. Experimental results on CUB-200, AWA and Mini- ImageNet datasets demonstrate our method effectively improves the performance.


    • Yaohui Zhu, Weiqing Min, Shuqiang Jiang. “Attribute-Guided Feature Learning for Few-Shot Image Recognition”, IEEE Transactions on Multimedia (TMM), 2020.


    IEEE Trans. Multimedia (2020, Accepted)
    [PDF]
News
MORE