This web pages introduces a novel real-world database that contains images extracted from real photographs acquired by reporters of the Czech News Agency (ČTK). It is further reported as Unconstrained Facial Images (UFI) database and is mainly intended to be used for benchmarking of the face identification methods, however it is possible to use this corpus in many related tasks (e.g. face detection, verification, etc.).
Two different partitions are available. The first one contains the cropped faces that were automatically extracted from the photographs using the Viola-Jones algorithm. The face size is thus almost uniform and the images contain just a small portion of background. The images in the second partition have more background, the face size also significantly differs and the faces are not localized. The purpose of this set is to evaluate and compare complete face recognition systems where the face detection and extraction is included.
Together with the dataset we show here the results of a set of experiments realized on this corpus. We use several state-of-the-art feature based methods that perform well on the other databases and that give particularly good accuracy on real-world data. The results serve as a baseline and we would like to encourage researchers to surpass these results.
Each photograph is annotated with the name of a person. However, some background objects and also other persons are often available. Due to a) financial/time constraints; b) necessity to be able to create quickly another face dataset on demand, we would like to create the database as automatic as possible (with minimal human efforts). The series of tasks realized in order to build the database are described in Lenc and Král. As already mentioned, we created two different partitions:
This set contains images of 605 people with an average of 7.1 images per person in the training set and one in the test set. The images are cropped to a size of 128 x 128 pixels. The following figure shows the images of two individuals from the Cropped images partition.
The total number of the subjects in this partition is 530 and an average number/person of training images is 8.2. The size of images in this partition is 384 x 384 pixels. Eight example images from the Large images partition are shown in the next figure. The main goal of this partition is to evaluate and compare complete face recognition systems. Therefore, additional steps before recognition itself are expected (face detection, background removal, etc.).
We would like to keep the testing protocol as straightforward and simple as possible. Therefore, both partitions are divided into training and testing sets. All images from the training sets are available as a gallery for training. The test sets are used as test images.
The images in the Cropped images partition should be used in its original size. Additional cropping or resizing is undesirable because of the comparability of the results. The images may be preprocessed and the preprocessing procedure must be described together with the reported results.
On the other hand we allow any preprocessing or cropping in the case of Large images partition. However, the whole procedure must be reported and thoroughly described. The recognition results should be reported as an accuracy (i.e. ratio between correctly recognized faces and all the faces).
The database is distributed in a directory structure. Each partition contains train and test directories which are composed of the sub-directories for each person named sxxx (xxx is the number of the subject).
This corpus is available only for research purposes for free. Commercial use in any form is strictly excluded. For further information about the UFI database, please, see the paper below:
Please, cite this paper when you used this database in your experiments.