Tensorflow Object Detection API Tutorial Part 1: Sharding the data

Aashay Sachdeva
3 min readSep 17, 2019


Tensorflow object detection API is a powerful tool for creating custom object detection/Segmentation mask model and deploying it, without getting too much into the model-building part. TF has an extensive list of models (check out model zoo) which can be used for transfer learning. One of the best parts about using TF API is that the pipeline is extremely optimized, i.e, your resource is not underutilized. If you have ever used Keras generator and trained a model, you will find how underutilized your CPU and GPU some times are, as data reading might become a bottleneck to training.TF API abstracts the part of building the pipeline away.

To follow the tutorial, install TF API by following the instruction provided on this link.


TFRecords is a serialized format used by TensorFlow for storing the data. While working with large datasets, using a binary file format for storing data has a significant impact on the performance of the import pipeline and as a consequence on the training time of the model. Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk.

To create TFRecords, create a dataframe with the following structure -

where path denotes the path for the image file, classes denote the class to which the image belongs to, and bb denotes the bounding box coordinates (x1,y1,x2,y2).

Once you have the dataframe ready, use the following code to create the tfrecords.

The bounding box needs to be normalized between 0 and 1. For doing that, divide x by image_shape[1] and y by image_shape[0] (Reason — Numpy image array shape).

We use tf.Gfile Module for creating the binary string for the image. I have found it to be the most optimized way of doing this.

If you look at the code for creating the tf.train.example, each value is passed as a list, since an image can have multiple bounding boxes. In such cases, just pass list of all values for the bounding box. Similarly, you can pass multiple labels, seg maps in one single example.

'image/object/bbox/xmin': tf.train.Feature(float_list = tf.train.FloatList(value = [xmins1,xmin2]))
'image/label': tf.train.Feature(int64_list = tf.train.Int64List(value = [label1,label2]))

Then, just initiate the class and pass the label map dictionary.

labels = {'upper': 1, 'lower': 2,'full':3}
t = GenerateTFRecord(labels)
t.convert_image_folder(train, 'images_train.tfrecord')
t.convert_image_folder(validation, 'images_valid.tfrecord')
t.convert_image_folder(test, 'images_test.tfrecord')

where train,validation and test are dataframes created for your trai,validation and test data respectively. Class label should always start from 1, since class 0 denotes ‘none of the above’.

Next, we will jump to training and exporting the model. Later, we will work on deploying the exported model using tensorflow serving.



Aashay Sachdeva