One important element of deep learning and machine learning at large is dataset. A good dataset will contribute to a model with good precision and recall. In the realm of object detection in images or motion pictures, there are some household names commonly used and referenced by researchers and practitioners. The names in the list include Pascal, ImageNet, SUN, and COCO. In this post, we will briefly discuss about COCO dataset, especially on its distinct feature and labeled objects.
tl;dr The COCO dataset labels from the original paper and the released versions in 2014 and 2017 can be viewed and downloaded from this repository.
A Dataset with Context
COCO stands for Common Objects in Context. As hinted by the name, images in COCO dataset are taken from everyday scenes thus attaching “context” to the objects captured in the scenes. We can put an analogy to explain this further. Let’s say we want to detect a person object in an image. A non-contextual, isolated image will be a close-up photograph of a person. Looking at the photograph, we can only tell that it is an image of a person. However, it will be challenging to describe the environment where the photograph was taken without having other supplementary images that capture not only the person but also the studio or surrounding scene.
COCO was an initiative to collect natural images, the images that reflect everyday scene and provides contextual information. In everyday scene, multiple objects can be found in the same image and each should be labeled as a different object and segmented properly. COCO dataset provides the labeling and segmentation of the objects in the images. A machine learning practitioner can take advantage of the labeled and segmented images to create a better performing object detection model.
Objects in COCO
As written in the original research paper, there are 91 object categories in COCO. However, only 80 object categories of labeled and segmented images were released in the first publication in 2014. Currently there are two releases of COCO dataset for labeled and segmented images. After the 2014 release, the subsequent release was in 2017. The COCO dataset is available for download from the download page.
To compare and confirm the available object categories in COCO dataset, we can run a simple Python script that will output the list of the object categories. This can be replicated by following these steps on Ubuntu or other GNU/Linux distros.
1. Download 2014 train/val annotation file
$ wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
2. Download 2017 train/val annotation file
$ wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
3. Inflate both zip files using unzip
$ unzip annotations_trainval2014.zip $ unzip annotations_trainval2017.zip
This will create a directory named “annotations” that contain the dataset annotations.
4. Create a Python file named coco-object-categories.py and type the following code.
Note: This should be considered a merely functional code instead of production code
#!/usr/bin/python cat_2014 = './annotations/instances_val2014.json' cat_2017 = './annotations/instances_val2017.json' import sys, getopt import json def main(argv): json_file = None try: opts, args = getopt.getopt(argv,"hy:") except getopt.GetoptError: print 'coco_categories.py -y <year>' sys.exit(2) for opt, arg in opts: if opt == '-y': if(arg == '2014'): json_file = cat_2014 else: json_file = cat_2017 if json_file is not None: with open(json_file,'r') as COCO: js = json.loads(COCO.read()) print json.dumps(js['categories']) if __name__ == "__main__": main(sys.argv[1:])
6. Run the python file
$ python coco-object-categories.py -y 2014 $ python coco-object-categories.py -y 2017
5. Observe the JSON output
After the observation, we will have the following tables that contain the comparison of object category list between the original paper and the dataset release.
|ID||Object (Paper)||Object (2014 Rel.)||Object (2017 Rel.)||Super Category|
|10||traffic light||traffic light||traffic light||outdoor|
|11||fire hydrant||fire hydrant||fire hydrant||outdoor|
|13||stop sign||stop sign||stop sign||outdoor|
|14||parking meter||parking meter||parking meter||outdoor|
|37||sports ball||sports ball||sports ball||sports|
|39||baseball bat||baseball bat||baseball bat||sports|
|40||baseball glove||baseball glove||baseball glove||sports|
|43||tennis racket||tennis racket||tennis racket||sports|
|46||wine glass||wine glass||wine glass||kitchen|
|58||hot dog||hot dog||hot dog||food|
|64||potted plant||potted plant||potted plant||furniture|
|67||dining table||dining table||dining table||furniture|
|77||cell phone||cell phone||cell phone||electronic|
|88||teddy bear||teddy bear||teddy bear||indoor|
|89||hair drier||hair drier||hair drier||indoor|
As you can see, the list of objects for the 2014 and 2017 releases are the same, which are 80 objects from the original 91 object categories in the paper.
If you need to have the object list as a text file, you can view and download it from this repository.
Moving to the discrepancies between the object list in the paper and dataset release, the missing object categories / labels are identical in both 2014 and 2017 dataset releases. So, let’s compile this data and create another table to enlist those missing labels.
What does this mean? Practically you need to source from different datasets if your objective is to build a model that also supports detection for the missing object categories / labels.
Beyond Coco Objects
Despite providing sufficient list of objects, there can be in circumstances where the object you want to identify is not included in the COCO labels list. This is especially true when building models in another / more specific domain or adding context to the object identification. There can be at least two common approaches to this:
- Manual labeling and modeling: Objects are labeled using bounding box or segmentation technique and neural network for object recognition (for e.g. RCNN, fast RCNN, faster RCNN) is applied to generate a new model for the object detection
- Transfer learning: Existing pre-trained model is adapted when performing object recognition in a new domain. A prevalent technique is by reusing the hidden layers of the pre-trained model to extract features of objects and replacing the final / output layer with classification that is specific to the new domain.
Transfer learning can be considered an advanced topic in computer vision. Reading various arXiv papers can be helpful in seeking better understanding about transfer learning. Additionally, you can also enroll in an online course to supplement the theoretical knowledge with practical knowhow. The course Advanced Computer Vision elaborates transfer learning and provides sample implementation that can be referenced when building your own model. If you just recently started your journey in machine learning / AI, you can also consider the course on machine learning fundamentals by Andrew Ng.