


One of the best known methods in this area is described in Mikolov et al’s 2013 word2vec paper. If we can extract a meaningful representation of the query in this space, we can interpret how close the image vector is to the query vector as a measure of how well the image matches the query.įortunately, extracting vector representations of text is the focus of a great deal of research in natural language processing. This vector represents the content of the image as a point in C-dimensional category space, where C is the number of categories (several thousand). We can interpret the output of the image classifier as a vector j c of the per-category scores. We could collate a large dictionary of synonyms and near-synonyms and hierarchical relationships between words, but this quickly becomes unwieldy, especially if we support multiple languages. Sure, if a user searches for beach we could return the images with the highest scores for that category, but what if they instead search for shore? What if instead of apple they search for fruit or granny smith?

Image classification lets us automatically understand what’s in an image, but by itself this isn’t enough to enable search. Take a look at how well image classification works today: Since then, with model architecture improvements, better training methods, large datasets like Open Images or ImageNet, and easy-to-use libraries like TensorFlow and PyTorch, researchers have built image classifiers that can recognize thousands of categories. The past decade has seen tremendous progress in image classification using convolutional neural networks, beginning with Krizhevsky et al ’s breakthrough result on the ImageNet challenge in 2012. characteristics of the image itself, such as black-and-white or close-up.overall scene descriptors like outdoors or wedding.specific objects in the image, such as tree or person.Higher scores indicate a higher probability that the image belongs to that category. We build this function using two key developments in machine learning: accurate image classification and word vectors.Īn image classifier reads an image and outputs a scored list of categories that describe its contents. Given this function, when a user does a search we run it on all their images and return those that produce a score above a threshold, sorted by their scores. Here’s a simple way to state the image search problem: find a relevance function that takes a (text) query q and an image j, and returns a relevance score s indicating how well the image matches the query.
