What Object Categories / Labels Are In COCO Dataset?

One important element of deep learning and machine learning at large is dataset. A good dataset will contribute to a model with good precision and recall. In the realm of object detection in images or motion pictures, there are some household names commonly used and referenced by researchers and practitioners. The names in the list include Pascal, ImageNet, SUN, and COCO. In this post, we will briefly discuss about COCO dataset, especially on its distinct feature and labeled objects.

tl;dr The COCO dataset labels from the original paper and the released versions in 2014 and 2017 can be viewed and downloaded from this repository.

A Dataset with Context

COCO stands for Common Objects in Context. As hinted by the name, images in COCO dataset are taken from everyday scenes thus attaching “context” to the objects captured in the scenes. We can put an analogy to explain this further. Let’s say we want to detect a person object in an image. A non-contextual, isolated image will be a close-up photograph of a person. Looking at the photograph, we can only tell that it is an image of a person. However, it will be challenging to describe the environment where the photograph was taken without having other supplementary images that capture not only the person but also the studio or surrounding scene.

COCO was an initiative to collect natural images, the images that reflect everyday scene and provides contextual information. In everyday scene, multiple objects can be found in the same image and each should be labeled as a different object and segmented properly. COCO dataset provides the labeling and segmentation of the objects in the images. A machine learning practitioner can take advantage of the labeled and segmented images to create a better performing object detection model.

Stay updated with the latest article.
Loading

Objects in COCO

As written in the original research paper, there are 91 object categories in COCO. However, only 80 object categories of labeled and segmented images were released in the first publication in 2014. Currently there are two releases of COCO dataset for labeled and segmented images. After the 2014 release, the subsequent release was in 2017. The COCO dataset is available for download from the download page.

To compare and confirm the available object categories in COCO dataset, we can run a simple Python script that will output the list of the object categories. This can be replicated by following these steps on Ubuntu or other GNU/Linux distros.

1. Download 2014 train/val annotation file

$ wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip

2. Download 2017 train/val annotation file

$ wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip

3. Inflate both zip files using unzip

$ unzip annotations_trainval2014.zip
$ unzip annotations_trainval2017.zip

This will create a directory named “annotations” that contain the dataset annotations.

4. Create a Python file named coco-object-categories.py and type the following code.
Note: This should be considered a merely functional code instead of production code

#!/usr/bin/python

cat_2014 = './annotations/instances_val2014.json'
cat_2017 = './annotations/instances_val2017.json'

import sys, getopt
import json

def main(argv):
    json_file = None 
    try:
        opts, args = getopt.getopt(argv,"hy:")
    except getopt.GetoptError:
        print 'coco_categories.py -y <year>'
        sys.exit(2)
    for opt, arg in opts:
        if opt == '-y':
            if(arg == '2014'):
                json_file = cat_2014
            else:
                json_file = cat_2017
    if json_file is not None:
        with open(json_file,'r') as COCO:
            js = json.loads(COCO.read())
            print json.dumps(js['categories'])

if __name__ == "__main__":
    main(sys.argv[1:])

6. Run the python file

$ python coco-object-categories.py -y 2014
$ python coco-object-categories.py -y 2017

5. Observe the JSON output

After the observation, we will have the following tables that contain the comparison of object category list between the original paper and the dataset release.

IDObject (Paper)Object (2014 Rel.)Object (2017 Rel.)Super Category
1personpersonpersonperson
2bicyclebicyclebicyclevehicle
3carcarcarvehicle
4motorcyclemotorcyclemotorcyclevehicle
5airplaneairplaneairplanevehicle
6busbusbusvehicle
7traintraintrainvehicle
8trucktrucktruckvehicle
9boatboatboatvehicle
10traffic lighttraffic lighttraffic lightoutdoor
11fire hydrantfire hydrantfire hydrantoutdoor
12street sign--outdoor
13stop signstop signstop signoutdoor
14parking meterparking meterparking meteroutdoor
15benchbenchbenchoutdoor
16birdbirdbirdanimal
17catcatcatanimal
18dogdogdoganimal
19horsehorsehorseanimal
20sheepsheepsheepanimal
21cowcowcowanimal
22elephantelephantelephantanimal
23bearbearbearanimal
24zebrazebrazebraanimal
25giraffegiraffegiraffeanimal
26hat--accessory
27backpackbackpackbackpackaccessory
28umbrellaumbrellaumbrellaaccessory
29shoe--accessory
30eye glasses--accessory
31handbaghandbaghandbagaccessory
32tietietieaccessory
33suitcasesuitcasesuitcaseaccessory
34frisbeefrisbeefrisbeesports
35skisskisskissports
36snowboardsnowboardsnowboardsports
37sports ballsports ballsports ballsports
38kitekitekitesports
39baseball batbaseball batbaseball batsports
40baseball glovebaseball glovebaseball glovesports
41skateboardskateboardskateboardsports
42surfboardsurfboardsurfboardsports
43tennis rackettennis rackettennis racketsports
44bottlebottlebottlekitchen
45plate--kitchen
46wine glasswine glasswine glasskitchen
47cupcupcupkitchen
48forkforkforkkitchen
49knifeknifeknifekitchen
50spoonspoonspoonkitchen
51bowlbowlbowlkitchen
52bananabananabananafood
53appleappleapplefood
54sandwichsandwichsandwichfood
55orangeorangeorangefood
56broccolibroccolibroccolifood
57carrotcarrotcarrotfood
58hot doghot doghot dogfood
59pizzapizzapizzafood
60donutdonutdonutfood
61cakecakecakefood
62chairchairchairfurniture
63couchcouchcouchfurniture
64potted plantpotted plantpotted plantfurniture
65bedbedbedfurniture
66mirror--furniture
67dining tabledining tabledining tablefurniture
68window--furniture
69desk--furniture
70toilettoilettoiletfurniture
71door--furniture
72tvtvtvelectronic
73laptoplaptoplaptopelectronic
74mousemousemouseelectronic
75remoteremoteremoteelectronic
76keyboardkeyboardkeyboardelectronic
77cell phonecell phonecell phoneelectronic
78microwavemicrowavemicrowaveappliance
79ovenovenovenappliance
80toastertoastertoasterappliance
81sinksinksinkappliance
82refrigeratorrefrigeratorrefrigeratorappliance
83blender--appliance
84bookbookbookindoor
85clockclockclockindoor
86vasevasevaseindoor
87scissorsscissorsscissorsindoor
88teddy bearteddy bearteddy bearindoor
89hair drierhair drierhair drierindoor
90toothbrushtoothbrushtoothbrushindoor
91hair brush--indoor

As you can see, the list of objects for the 2014 and 2017 releases are the same, which are 80 objects from the original 91 object categories in the paper.

If you need to have the object list as a text file, you can view and download it from this repository.

Moving to the discrepancies between the object list in the paper and dataset release, the missing object categories / labels are identical in both 2014 and 2017 dataset releases. So, let’s compile this data and create another table to enlist those missing labels.

IDObjectSuper Category
12street signoutdoor
26hataccessory
29shoeaccessory
30eye glassesaccessory
45platekitchen
66mirrorfurniture
68windowfurniture
69deskfurniture
71doorfurniture
83blenderfurniture
91hair brushindoor

What does this mean? Practically you need to source from different datasets if your objective is to build a model that also supports detection for the missing object categories / labels.

Beyond Coco Objects

Despite providing sufficient list of objects, there can be in circumstances where the object you want to identify is not included in the COCO labels list. This is especially true when building models in another / more specific domain or adding context to the object identification. There can be at least two common approaches to this:

  • Manual labeling and modeling: Objects are labeled using bounding box or segmentation technique and neural network for object recognition (for e.g. RCNN, fast RCNN, faster RCNN) is applied to generate a new model for the object detection
  • Transfer learning: Existing pre-trained model is adapted when performing object recognition in a new domain. A prevalent technique is by reusing the hidden layers of the pre-trained model to extract features of objects and replacing the final / output layer with classification that is specific to the new domain.

Transfer learning can be considered an advanced topic in computer vision. Reading various arXiv papers can be helpful in seeking better understanding about transfer learning. Additionally, you can also enroll in an online course to supplement the theoretical knowledge with practical knowhow. The course Advanced Computer Vision elaborates transfer learning and provides sample implementation that can be referenced when building your own model. If you just recently started your journey in machine learning / AI and want to develop not only the theoretical but also implementation skill, you can also consider the TensorFlow AI course or Pytorch AI course.



19 thoughts on “What Object Categories / Labels Are In COCO Dataset?

  1. Nandu Raj

    Is there a trained model for fish detection (really helpful if it detects semantic/instance segmentation rather than bounding boxes)

    Reply
    1. Mikael Fernandus Simalango Post author

      CIFAR-100 has fish superclass even though the object classes are rather limited. Imagenet also has fish superclass and more object classes. If you only want to identify fish but not the species using segmentation, you can build the fish model with Detectron.

    1. Mikael Fernandus Simalango Post author

      If you want to differentiate the clothes by the features, you can approach by using semantic segmentation approach and train a new model.

    1. Mikael Fernandus Simalango Post author

      It is more likely that you will end up building the model with transfer learning.

  2. Dynamic Hosting

    Amazing…..all information is very helpful in this blog. I did not just come here from the perspective of commenting posts; these are good ideas for. Hope you will share the best information with us even further.

    Reply
    1. Mikael Fernandus Simalango Post author

      Have you checked Open Model Zoo? You can choose one of the pre-trained models and train the model with your custom objects. Here is sample implementation for fruits.

  3. Aysha

    Hi Michael I have read this blog and u r simply awesome.because u r having solution of every one’s problem.
    Now plz help me also. I have tried but I couldn’t do it.

    Reply
  4. jayasree

    I’m working on a project where I need to detect specific brands of objects in images, even when there are similar objects from different brands present. how much dataset I would need for effective detection? Additionally, how many annotations would be necessary?

    In a typical image, there are 3-6 target objects and around 20 other objects. What would be the best approach for annotating these images?

    Reply

Leave a Reply to Fahad Khan Cancel reply

Your email address will not be published. Required fields are marked *