TVGraz Dataset
TVGraz is an annotated multi-modal dataset which currently contains 10 visual object categories , 4030 images and associated text. The visual appearance of the objects in the dataset is challenging and offers a less biased benchmark.
The objective of the multi-modal dataset is to provide a common means for evaluation of object categorization research based on text and vision.
Bench Marks
In order to produce some benchmark results on TVGraz dataset, we performed the following three different experiments: we individually trained a classifier based on the either visual or textual kernels, followed by the application of the MKL which uses both of these sources. During the training, for each category, we train a classifier by randomly choosing 50 positives and 50 negatives samples from the dataset.
The average precision-recall curves and the kernels weights for each category in TVGraz dataset. The black doted line shows the results based on text only features, the blue dotted line shows the results based visual features only and the red solid line shows the results based on text plus visual features. For each category the weights for text and vision based kernels are also shown.
