Sections
You are here: Home Members Riemenschneider Hayko Dataset Overview

Dataset Overview

An overview over various computer vision evaluation datasets and current state of the art working on it. Please note that this site is UNDER CONSTRUCTION and wants your input! If you have additional information or new results on a dataset, send me an email to hayko<at>icg.tugraz.at. My next move is to setup up a wikipedia for better search, categorization and tagging of the datasets. So again: FEEDBACK WELCOME! :)

Datasets MPEG-7 Kimia Myth Tools ETHZ shape
  Weizmann horses INRIA horses MSRC PASCAL VOC Camvid
Protocols Bullseye PASCAL/IOU Shotton
Code PASCAL Ferrari Dollar
References


for a visually nicer layout click here.

Chapter 1
Introduction

An overview of active computer vision datasets by Hayko Riemenschneider.

Please note that this site is UNDER CONSTRUCTION and wants your input! If you have additional information or new results on a dataset, send me an email to haykoaticg.tugraz.at.

My next move is to setup up a wiki for better search, categorization and tagging of the datasets. So again: Feedback welcome! :)

Interesting overviews have been setup at featurespace.org, PASCAL Visual Object Classes, TUD/ETHZ dataset, a generic computer vision dataset, collection of test images and probably many more. Here want to show have more than a list. We show dataset experience!

Chapter 2
Datasets

2.1 Shape Retrieval

2.1.1 MPEG-7 Shape Retrieval

MPEG-7 — Task: Shape Retrieval — Protocol: Bullseye — Download

MPEG-7 silhouette database [] is a popular database for shape matching evaluation consisting of 70 shape categories, where each category is represented by 20 different images with high intra-class variability. The shapes are defined by a binary mask outlining the objects.

The evaluation protocol for this retrieval task is the ”bullseye rating”, in which each image is used as reference and compared to all of the other images. The mean percentage of correct images in the top 40 matches (the 40 images with the lowest shape similarity values) is taken as bullseye rating.

The Latecki group maintains an overview of recent results here.

Download MPEG-7 Core Experiment CE-Shape-1

Note: It raises interesting questions how to define the shape of an object, as there are very similar objects (apples and device9) in two categories, however the octopus category has much larger intra-class variances and is still the same category.

Shape description & matching Post processing CSS IDSC+DP Hier Procr. Shape tree Graph Trans Densified Spaces BeyondPairwise [18] [9] [12] [13] [7] [8] [KontschiederACCV09] 75.44 85.40 86.35 87.70 91.00 93.32 93.40

2.1.2 Kimia25

Kimia25 — Task: Shape Retrieval — Protocol: TopRank — Download

2.1.3 Kimia99

Kimia99 — Task: Shape Retrieval — Protocol: TopRank — Download

The Kimia 99 has 9 classes each consisting of 11 images. The evaluation is an all vs. all comparison and ranked the similarity values for each reference shape and reporting the 10 closest matches to each reference image are in the same category.





BeyondPairwise X Y
100% 0%10%




Table 2.1: Shape retrieval results for the Kimia 99 dataset.

2.1.4 Kimia216

Kimia216 — Task: Shape Retrieval — Protocol: TopRank — Download

The Kimia 216 has X classes each consisting of Ximages. The evaluation is an all vs. all comparison and ranked the similarity values for each reference shape and reporting the X closest matches to each reference image are in the same category.

2.1.5 SIID Silhouette

SIID — Task: Shape Retrieval — Protocol: TopRank — Download

The SIID silhouette dataset contains

Download SIID silhouette dataset

Mythological creatures 2D website

2.1.6 Leaves

The Leaves dataset from X contains X images of leaves.

2.2 Partial Shape Matching

2.2.1 Mythological Creatures

Myth — Task: Partial Matching — Protocol: ? — Download

The Mythological Creatures consists of articulated shapes (silhouettes) for partial similarity experiments and contains 15 shapes: 5 humans, 5 horses and 5 centaurs. Each shape differs by an articulation and additional parts. The shapes are defined by a binary mask outlining the objects.

Download Mythological creatures 2D database v1.0

Mythological creatures 2D website

2.2.2 Tools2D

Tools — Task: Partial Matching — Protocol: ? — Download

The Tools 2D dataset from Bronstein, Bronstein, Bruckstein, and Kimmel for partial similarity experiments and consists of 15 shapes: 5 humans, 5 horses and 5 centaurs. Each shape differs by an articulation and additional parts. The shapes are defined by a binary mask outlining the objects.

2.3 Object detection by Shape

2.3.1 ETHZ Shape Classes

ETHZ-Shape — Task: Detection by Shape — Protocol: PASCAL 50% — Download

The ETHZ Shape Classes dataset from Vittorio Ferrari consists of five object classes and a total of 255 images. All classes contain significant intra-class variations and scale changes. The images sometimes contain multiple instances of a category and have a large amount of background clutter.

The evaluation protocol entails either a) using the hand drawn models and testing on all 255 images, or b) training 50% of a category’s images and testing on all remaining images. For example, the applelogo class contains 40 images. Thus training is done on a random subset of 20 and testing is done on the remaining 20+215 images. The evaluation criterion used to be the IOU 20% and is now the standard PASCAL 50%. The code by Ferrari contains samples splits and routines for randomly creating train and test splits.

The ETHZ shape dataset has been used for object detection in various forms including shape only detection, complex features detection, transfer learning and segmentation. Hayko Riemenschneider created a pixel-accurate segmentation annotation.

Download ETHZ Shape Classes dataset v1.2

ETHZ shape classes website

Download ICG’s ETHZ Shape Classes segmentation annotation

2.3.2 ETHZ Extended Shape classes

ETHZ-Shape2 — Task: Detection by Shape — Protocol: PASCAL 50% — Download

The ETHZ Extended Shape classes dataset from Konrad Schindler is larger dataset of shape categories, created by merging ETHZ shape classes with Konrad Schindler’s closed shapes. It now consists of 7 shape classes each with 50 images.

The protocol for object detection is X. The criterion is PASCAL 20% or PASCAL 50%.

Download ETHZ Extended Shape classes

ETHZ Extended shape classes website

2.3.3 Weizmann horses

The multi-scale Weizmann horses (originally from Eran Borenstein, adapted by Jamie Shotton) consists of 656 images which is split into 50+50training, 50+50 validation and 228+228 testing images. It contains side views of horses with a large variation in pose and contains segmentation masks for the horses.

The single-scale Weizmann horses contains 328 horse and 900 background images resized to have the horses at single scale. It also contains segmentation masks for the horses.

The protocol for detection entails training on the first 100 images (see readme) and evaluating the Shotton criterion. The bounding box is not relevant and is not evaluated.

Segmentation is also evaluated on this dataset... using (fixme?)

Download Weizmann horses (single-scale)

Download Weizmann horses (multi-scale)

Weizmann horses website

2.3.4 INRIA horses

INRIA-horse — Task: Detection by Shape & Segmentation — Protocol: PASCAL 50% — Download

The INRIA horses dataset from Frederic Jurie and Vittorio Ferrari consists of 170 images with one or more horses in side-view at several scales and cluttered background, and 170 images without horses. The annotation contains bounding boxes and binary segmentation masks (fixme?).

The protocol entails 50 positive examples for the training and test on the remaining images (120+170). The evaluation criterion is PASCAL 50%.

Segmentation is also evaluated on this dataset... using (fixme?)

Download INRIA horses dataset v1.03

INRIA horses website

2.4 Object detection

2.4.1 TUD Pedestrians Test

The TUD Pedestrians dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 250 images with 311 fully visible people with significant variation in clothing and articulation.

The dataset is typically used for single-frame detector evaluation. The protocol entails training on the TUD Pedestrian training dataset and its specifications. The evaluation criterion is the PASCAL 50%.

Download TUD Pedestrians dataset

TUD Pedestrians website

2.4.2 TUD Crossing

The TUD Crossing dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 201 images with 1008-1212 highly overlapping pedestrians with significant variation in clothing and articulation. The original annotation by Andriluka et al. [?] contains 1008 tight bounding boxes for pedestrians with at least 50% visibility, ignoring many overlapping pedestrians. A second annotation by Barinova annotated 1018 pedestrians, still missing many pedestrians. Typically three scales are used to evaluate on this dataset, however there are only minor scale changes, thus a good single scale may also be sufficient.

The dataset is typically used for tracking evaluation, however recently single-frame detectors were evaluated because of the challenge in overlaps. The protocol entails training on the TUD Pedestrian training dataset and its specifications. The evaluation criterion is the PASCAL 50%.

Download TUD Crossing dataset

TUD Crossing website

2.4.3 TUD Campus

The TUD Crossing dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 71 images and 303 highly overlapping pedestrians with large scale changes. The original annotation by Andriluka et al. [?] contains 303 tight bounding boxes for pedestrians with at least 50% visibility. A second annotation by Barinova annotated X pedestrians. Typically five scales are evaluated.

The dataset is typically used for tracking evaluation, however recently single-frame detectors were evaluated because of the challenge in overlaps. The protocol entails training on the TUD Pedestrian training dataset and its specifications. The evaluation criterion is the PASCAL 50%.

Download TUD Campus dataset

TUD Campus website

2.4.4 TUD Pedestrians Training

The TUD Pedestrians training dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 210 and 400 training images with X pedestrians with significant variation in clothing and articulation. The annotation contains bounding boxes and segmentation masks.

The dataset is typically used for training single-frame pedestrian detectors using additional virtual training samples by flipping, rotating, scaling, shifting and bootstrapping the training data. Sometimes it is also mixed up with the INRIA People training dataset.

Download TUD Pedestrians training (210 images)

Download TUD Pedestrians training (400 images)

TUD Pedestrians website

2.4.5 INRIA People

The INRIA People dataset from Navneet Dalal and Bill Triggs consists of training and testing data. The training contains X images and X people normalized to 64x128 pixels (see ’train_64x128_H96’).

The dataset is typically used for training single-frame pedestrian detectors using additional virtual training samples by flipping, rotating, scaling, shifting and bootstrapping the training data.

Download INRIA People dataset

INRIA People website

2.4.6 TUD Motorbike

2.4.7 TUD Cows

2.4.8 CALTECH 101

2.4.9 CALTECH 256

2.4.10 Graz01

2.4.11 Graz02

2.4.12 UIUC Cars single scale

2.4.13 UIUC Cars multi scale

2.4.14 PASCAL VOC 2007

2.4.15 PASCAL VOC 2008

2.4.16 PASCAL VOC 2009

2.4.17 PASCAL VOC 2010

2.5 Multi-View Object Detection

2.5.1 ICG Lab

2.5.2 EPFL

2.5.3 Basketball

2.5.4 MSR3D

The MSR 3D Video dataset from X is

2.6 Scene Segmentation

2.6.1 CamVid

The Cambridge-driving Labeled Video Database (CamVid) dataset from Gabriel Brostow contains ten minutes of video footage and corresponding semantically labeled groundtruth images at intervals. There exist 32 semantic classes and 701 segmentation images.

The package from Brostow also contains an InteractLabeler, paint stroke logs, color2label assignments and various statistics.

The dataset is typically used for semantic scene segmentation, and recently has also been augmented with multi-view recontruction using 3D data as additional cue.

Download CamVid dataset Download CamVid annotation

CamVid website

2.6.2 MSRC v1

The MSRC v1 dataset from Microsoft Research in Cambridge contains 240 images and 9 object classes with coarse pixel-wise labeled images.

The dataset is commonly used for full scene segmentation. The training is done on 120 images and testing on 120 images (50/50 split).

Download MSRC v1 dataset

MSRC website

2.6.3 MSRC v2

The MSRC v2 dataset is an extension of the MSRC v1 dataset from Microsoft Research in Cambridge. It contains 591 images and 23 object classes with accurate pixel-wise labeled images. Though it contains 23 object classes, only 21 classes are commonly used. The unused labels are (void==0, horse==5, mountain==8) due to background or too few training samples.

The dataset is commonly used for full scene segmentation, and may also be used for object instance segmentation, as the current annotation also contains individual object instances next to pure class annotation.

The training is done on 276X images, validation on 59 and testing on 275 images. The classes should be equally spread among the 45%, 10% and 45% splits. There exists an example split from Jamie Shotton, and the code for TextonBoost has routines for generating new splits.

Evaluation is done using an average pixel-wise, class-average and PASCAL class-wise accuracy, ignoring the three classes as mentioned before to only evaluate 21 classes.

Download MSRC v2 train/val/test split

Download MSRC v2 dataset

MSRC website

2.6.4 Interactive Segmentation

The Interactive Segmentation (IcgBench) dataset from Jakob Santner contains 243 images and 262 segmentation. Some images have multiple segmentations. The annotations are given in user strokes and pixel-wise segmentation.

The dataset is typically used for interactive multi-label segmentation given user strokes evaluating segmentation and runtime performance.

Download IcgBench dataset

IcgBench website

2.7 Multi-View Segmentation

2.7.1 Video consistency

2.8 Action Recognition

2.8.1 Weizmann Action

2.8.2 KTH Action

2.8.3 Hollywood Videos

2.9 Image Retrieval

2.9.1 ZuBuD

The Zurich Building dataset (ZuBud) from X contains 1000+ buildings in each five views.

2.9.2 UK Benchmark

2.9.3 Oxford Buildings

2.9.4 CMP dataset

2.10 Pair Retrieval

2.10.1 BCED

2.10.2 PhotoTourism

2.11 Tracking

2.11.1 TUD Campus

2.11.2 TUD Crossing

2.11.3 Babenko

2.11.4 PN Learning

2.12 Detector and Descriptor

2.12.1 Miko

2.13 Detector and Descriptor

Chapter 3
Evaluation criterion

3.0.1 Bullseye rating (40)

A true positive means intersection of union of two bounding boxes must be more than 20%. PASCAL / IOU (20%, 50

A true positive means a mutual overlap of two bounding boxes, which is defined as the intersection area over the union area of two bounding boxes must be more than 20% / 50%. See PASCAL VOC Challenge code.

3.0.2 Shotton 25px

A true positive is a deviation of less than 25px from the object center. Bounding boxes are not relevant and not evaluated.

3.0.3 Average pixel-wise segmentation

This score evaluates how well a multi-class segmentation fits the groundtruth by individually deciding per pixel if it has a correct label. Note, the final average is calculated over all correct pixels over all test pixels, and not image-wise. This score may be influences by an imbalance in the test classes.

                    ∑  TP
scorepixelxwise = ∑------------
                   T P +  FN
(3.1)

3.0.4 Average class segmentation

This score evaluates how well a multi-class segmentation fits the groundtruth by leveraging potential class frequencies. A true positive is still decided per pixel if it has the correct label, however one calculates a class-average first to equally balance all classes.

                ∑    ∑
                   ∑-TPTP+FN-
scoreclassxwise = ------------
                     C
(3.2)

3.0.5 PASCAL class-wise segmentation

This score evaluates how well a multi-class segmentation fits the groundtruth by leveraging potential class frequencies. It is similar to the average class segmentation, however employs an even stricter measure considering (false positive assignments...?)

Chapter 4
Evaluation code

4.1 Piotr Dollar toolbox

4.2 PASCAL VOC challenge framework

4.3 Ferrari framework

4.4 Felzenszwalb Deformable Part Model

Chapter 5
Index

5.1 Category Index

[List of object categories and back references to the datasets they are used in.]

5.2 Paper Index

[Index of papers using computer vision dataset.]

5.3 Dataset Index

[Index of dataset using computer vision dataset.]

Document Actions
[Powered by Plone]