中科院计算所视觉信息处理与学习组
中科院计算所视觉信息处理与学习组


您所在的位置 / 资源共享

资源共享

CFW-60K Dataset

1. Overview

CFW-60K dataset is a purified subset of Celebrity Faces on the Web (CFW) with additional visual attribute annotations. The face images are associated with identity and visual attribute labels, and thus can be used for many different tasks, e.g. hash learning, attribute learning, and so on. Specifically, the originally released identity labels of CFW are manually purified, and 500 subjects with 29 to 184 images each are selected, resulting in 60,000 images in total. Moreover, we further select 10,000 out of 60,000 images (20 images per subject) to annotate 14 facial attributes including gender, race, age, eye accessory, and facial expression with the help of seven annotators. The final attributes annotations are presented with three values, i.e. +1, −1, and 0, for indicating the presence, absence, and uncertainty of a certain attribute respectively.

2. Data annotation

2.1 Label purification

The original identity labels of CFW are generated automatically, which inevitably contain a number of incorrect annotations. To fix such problem, for each face image we invite three annotators to check whether the claimed label is correct, and only images with three confirmations are preserved. Besides, the three annotators are also required to annotate five facial landmarks for each face (i.e., geometric centers of two eyes, tip of nose, and two corners of mouth). Finally, the purified CFW contains 153,461 faces of 1,520 subjects.

2.2 Attribute label

Based on the purified CFW dataset, 500 subjects with 29 to 184 images each are selected, resulting the 60,000 images of CFW-60K. Moreover, we further select 10,000 out of 60,000 images (20 images per subject) to annotate 14 facial attributes including gender, race, age, eye accessory, and facial expression with the help of seven annotators. The final attributes annotations are presented with three values, i.e. +1, −1, and 0, for indicating the presence, absence, and uncertainty of a certain attribute respectively. Specifically, the annotated attributes, in order, are: (1) male, (2) female, (3) asian, (4) white, (5) black, (6) Indian, (7) youth, (8) mid-aged, (9) senior, (10) no glasses, (11) wearing eye glasses, (12) wearing sun glasses, (13) positive expression, (14) neutral expression. Some example images are shown below:

2.3 Data processing

For all images, we crop and align the face regions using the manually annotated facial landmarks. The cropped regions are then resized to 256*256 pixels.

3. Dataset partition

We suggest dividing the 10,000 images with both identity and attribute labels into two equal-sized parts, with 10 images per subject in each part, where one of these two parts are used as training set and the other is used as test set. The remaining 50,000 images that only have identity labels constitute the auxiliary set, which can be used for training or testing depending on the specific protocol.

4. Contact

Ruiping Wang (wangruiping@ict.ac.cn), Institute of Computing Technology, Chinese Academy of Sciences

Haomiao Liu (haomiao.liu@vipl.ict.ac.cn), Institute of Computing Technology, Chinese Academy of Sciences

5. Download

The CFW-60K dataset is released to universities and research institutes for research purpose only. To request a copy of the CFW-60K dataset, please do as follows:
•  Send an email to Dr. Wang (wangruiping@ict.ac.cn). When we receive your email, we would provide the download link to you.
•  By using the CFW-60K dataset, you are recommended to refer to the following paper:
 Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan, Xilin Chen. Two Birds, One Stone: Jointly Learning Binary Code for Large-scale Face Image Retrieval and Attributes Prediction. International Conference on Computer Vision (ICCV), pp: 3819-3827, December 2015.

 

 

 


视觉信息处理和学习组
  • 单位地址:北京海淀区中关村科学院南路6号
  • 邮编:100190
  • 联系电话:010-62600514
  • Email:yi.cheng@vipl.ict.ac.cn
  • Valse

  • 深度学习大讲堂

版权所有 @ 中科院计算所视觉信息处理与学习组 京ICP备05002829号 京公网安备1101080060