Image Scanning and Processing with ML Models

Image Scanning and Processing with ML Models

Image Scanning & Processing with Machine Learning models

One of our Fortune 500 clients in the logistics industry wanted to extract various product-related information by scanning images through a machine learning model. This scanned information had to then be supplied to a custom web application for further utilization and analysis.

Image scanning for logistics

The Objective

The image scanning and detection had to happen on the below aspects

  • Identifying the object in the image
  • Localization of the object
  • Measuring the width and height of the objects in the image


The GoodWorkLabs Machine Learning Solution:

Our data scientists used the Faster-RCNN algorithm to solve the problem statement. We followed the below procedure to achieve the desired results.

  • We ran the image through a CNN to get a Feature Map, a matrix representation of the image between a neural network layer
  • We ran the activation map through a separate network called the Region Proposal Network(RPN), which identified the bounding boxes (interesting regions) for those objects. This output (regions) was then passed on to the next stage.
  • Each and every output of the bounding boxes was analyzed and the most appropriate bounding box coordinates was accepted.

Faster-RCNN works quicker because we pass the activation map through a few more layers to find the bounding box (interesting regions). This forward pass continuously takes place and during this training phase, the ML model continues to learn. Errors (if any) are captured at this stage and with continuous learning, the model becomes efficient in predicting the classes and bounding box coordinates.

For calculating the height and width of each object we continued to iterate every object in the image and calculated values using OpenCV.

Faster Rcnn - ML model

Image reference:



To perform this image scanning process, we had a well-annotated object in each of the images in the dataset. We had around 1000 labels for each object.


How did we train our ML Model:

  • We downloaded pre-trained models and weights. The current code support is VGG16 
  • We also got access to pre-trained models which were provided by pytorch-vgg 
  • In the next step, we trained our model from fine-tuning to a pre-trained Faster R-CNN model. We followed this approach because a pre-trained Faster R-CNN contains a lot of good lower level features, which can be used generally.
  • We trained the model for 150 epochs.


GPU utilization:

The models were then exported to Microsoft Azure’s GPU for better performance. The expected inference time for a given image is ~0.2 seconds.


Technology Stack:

The technology stack used to implement this image scanning ML model was Python, Pytorch, OpenCV, Microsoft Azure.


The GoodWorkLabs AI and ML solution:

Are you looking for a partner who can build advanced AI/ML technologies for your business and make every interaction of your business intelligent? You are at the right place.

We love data and we are problem solvers. Our expert team of data scientists dives deep into solving and automating complex business problems. From Automobile to Fintech, Logistics, Retail, and Healthcare, GoodWorkLabs can help you build a custom solution catered for your business.

Leave us a short message with your requirements.

Field will not be visible to web visitor