Picture this, you walk into a store, find a shirt you really like but it is not available in your size. So you take a picture hoping to buy something similar later. I think most of us have been in this situation. Well now, there is a solution in the form of a search alternative called visual search which will do exactly this. Let's try understanding the definition of this before we delve into its process.

Visual search literally means the term itself:

search for an object using a picture and get results of similar products.

But this overly simplified description can't be it all, since we are aware of how complex technology is. Therefore, let us breakdown visual search into processes which may help us visualize how it actually works.

The Search Process

visual-search-process

Let's say we are looking to find blue shoes that we saw on Instagram. We now take a snap of the shoes and upload the Instagram image on a website that is known for its extensive shoe collection (for our reference, let's call the website awesomeshoes.com) and wait for results so we can buy our new shoes!

While we wait for the website to do its magic and give us the expected results, there is an ongoing continuous search in the backend that is taking place to narrow down the results. The image in concern is scanned by an artificial neural network to generate image descriptors which are compared against a pre-existing index. This index is pre-created when awesomeshoes.com is first on-boarded to Turing Analytics and contains the entire list of its available products. The image descriptors generated by the neural network describe all the information necessary such as: ‘what shoes are we searching for?’, ‘what is the colour?’ ‘what material is it made up of?’ etc.

The algorithmic search result now filters the index down to the Top 30 most similar shoes. The result list is structured in such a way, that the top-most result is the most comparable to the blue shoes. In other words, the results are ranked based on a similarity score.

how-visual-search-works-results

The search is conducted based on object type, pattern, color and any other relevant physical aspects of the image. So Awesomeshoes.com gets back to us in about half a second with similar shoes which we can browse through to make our selection.

Creating search index of the catalog

If you see below, you will notice that the search index which is created, is derived from the image descriptors that is uploaded on the neural network. Let’s explore our methodology of creating an extensive search index.

Awesomeshoes.com will have a list of its available shoes with unique images to identify different shoes in its catalog. For our purpose, let's say it is called “Big list of shoes”. Turing Analytics will take this Big list of shoes and separate the data into images and its meta data (ID number, price, type).

visual-search-index-creation

Next, they will upload the images of the shoes into the neural network to create the search Index. The neural network's output is known as “image descriptors”. These are descriptions of the features of the images being uploaded, like pattern, color, shape, type etc. The index of catalog images is made up of image descriptors.

Note that only the images of shoes are uploaded as input and not the meta data.

Turing Analytics takes about 2 hours to create an index of catalog images with half a million products using its array of powerful GPUs.

Training of the Neural Network

The Neural Network is quite literally the brain behind the visual search process and needs to be trained to create image descriptors. In our case, its needs to be exposed prior to our search to as many pictures of shoes as possible, so it can recognize details of different shoes. This continuous exposure is termed as “model training”.

visual-search-process-neural-network-training

To reiterate, the objective of training the neural network is that it can recognize details of each shoe that it is exposed to, and create a list of accurate image descriptors for each shoe.

So, to start off the training, our input images will be a variety of shoes. Let’s say, after the first exposure the neural network brings back the image descriptor as a black shoes, the training process will mark this output invalid. The neural network learns that these are not the required image descriptors and thereby it will go on to restart the process.

Now let's say, in the next iteration, the neural network brings back the image descriptor as blue shoes and ask for validation. This will prompt the training process to validate the findings positively.

This end to end process ensures that the neural network will learn how to recognize objects and create a list of detailed image descriptors. The neural network training is an ongoing process at Turing Analytics would have taken place thousands of times as you finish reading this sentence.

To summarise

  • A well trained neural network accurately creates detailed image descriptors.
  • At the time of onboarding, a client's catalog is analysed using this pre-trained neural network to generate image descriptors for each product.
  • Our proprietary search algorithms take these image descriptors to create search index.
  • A distributed system architecture ensure fast and scalable search process.

Visual search is slowly replacing the traditional methods of browsing through a variety of catalogs or relying on the salesperson to guide the customers. It is slowly seeping into our everyday lives. Organizations that use visual search will see lower abandoned carts and an increase in customer satisfaction.

Turing Analytics' extensive research on visual intelligence stands behind our vision to ensure most accurate results. Check out our demo to feel the magic of Visual Search.