How Does Reverse Image Search Work?

Reverse Image Search Feature

Your average picture is probably worth significantly less than a thousand words – there’s only so much you can learn from selfies. But sometimes you just really need to know where an image came from, regardless of how many words it’s worth.

For that, there are reverse image search engines provided by the likes of Google, TinEye, Bing, Yandex, Pixsy, and many more. Since you’re not providing any words in your query, though, how do they know what to look for? And, most importantly, how do they find it? How each search engine’s reverse image search works varies, and they keep their exact algorithms under wraps, but the basic ideas are out there and are not so hard to grasp.

Also read: 7 of the Best Search Engines For Privacy

Fingerprinting

Pictures may actually be more unique than human fingerprints, since the odds of two pictures containing the exact same arrangement of pixels are unimaginably infinitesimal, while the chance of a fingerprint collision is around 64 billion – comparatively good odds. But how do you fingerprint a picture? The steps vary depending on the algorithm, but most of them follow the same basic formula.

First, you have to measure the image’s features, which may include color, textures, gradients, shapes, relationships between different pieces of the picture, and even things like Fourier Transforms (a method of breaking images down into sine and cosine).

Let’s say we’re looking for the following image and we need a fingerprint of it.

Reverse Image Search New York Street

To do that, we might, among other things, use the image’s color histogram, Fourier Transform, and texture map, each of which you can see below.

Reverse Image Search Color Histogram
Reverse Image Search Fourier
Reverse Image Search Texture Map

If an image was resized, blurred, rotated, or otherwise manipulated, there would be a number of algorithms using the above and other features to try to find hits.

Encoding, storing, and searching

Every image feature in the fingerprint can be encoded as strings of letters and numbers, which are easy to store and index in a database. Whatever combination of features are extracted and stored will become the reverse image search engine’s entry for that picture. TinEye’s database, for example, contains around 39.6 billion indexed images as of February 2020, meaning they’ve run their algorithm over that many pictures and are storing all those fingerprints to compare searched images to.

Reverse Image Search Database

The second major part of the algorithm is figuring out which images are similar. When you upload a picture, it’ll go through the reverse image search engine’s fingerprinting algorithm. The search engine will then try to find the entries with the closest fingerprints, referred to as “image distance.” Deciding which factors to compare and how to weight them is also up to each search engine, but they’re mostly aiming to find a total image distance as close to zero as possible.

What about machine learning/AI?

Thanks to the fingerprinting/indexing techniques described above, reverse image search was pretty good even before it was practical to apply AI to it. Since AI is excellent at processing images, though, things like convolutional neural networks (CNNs) are most likely being used by many of the major search engines to help extract and label features. Google, for example, could be using a CNN in its reverse image search, allowing it to come up with likely keywords for the picture and produce relevant web and image results, as they’ve been doing in Google Photos for quite some time now.

Reverse Image Search Convolutional Neural Network

This takes reverse image search a step above simple feature extraction and image distance. Convolutional neural networks essentially run images through multiple filters that map out several different types of features, then attempt to classify them based on previous training. That’s an oversimplification, of course, but suffice it to say that CNNs make image search much more accurate and helpful and are probably being implemented alongside the older computer vision fingerprinting methods.

What’s the best reverse image search engine?

Reverse Image Search Mestia Google

Different algorithms mean different image search engines are good at different things, though they’re all ultimately aiming at the same target: finding a match for the picture you uploaded. Google Images has a pretty good hit rate, for example, but does a lot of “best guessing,” which gets you many photos that are similar but not identical. That’s great if you’re after a mood or general category, but an engine like TinEye is much more focused on finding identical images, even if they’re heavily edited, and can even identify images within photos, which makes it a bit better if you need an exact match.

Reverse Image Search Mestia Tiney

Russian search engine Yandex is also reputed to have an excellent image search tool, though it perhaps predictably tends to do best on Russian topics. Tools like Pixsy and ImageRaider are focused on identifying instances of unauthorized use, so they tend to include more features like alerts and focus on monitoring user photo libraries.

Because the algorithms change all the time and are generally kept locked down, it’s worth checking several different engines if one doesn’t return the results you’re after.

Image credits: Steam from a New York City street, DB-database-icon

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Andrew Braun Avatar

Read next

When Sony shipped the first Walkman in 1979, chairman Akio Morita insisted on a second headphone jack and a “hotline” talk button, convinced it would be rude for one person to listen to music alone — and within a few years buyers had ignored the sociable features so completely that Sony quietly dropped them
Russia still custom-builds the Soyuz return seats for ISS crew members using plaster casts taken weeks before launch, because astronauts grow as much as five centimetres taller during a long-duration stay and a seat moulded to their Earth-shaped spine would no longer fit the body that comes home
The “CrackBerry” nickname stuck for a reason — and the variable-reward psychology that hooked early-2000s executives on their BlackBerrys is the exact same machinery now running every push notification on every smartphone in your pocket
In 1843, Ada Lovelace described a brass-and-punched-card engine that could act on symbols as well as numbers, even composing music if harmony could be reduced to rules, inside seven translator’s notes three times longer than the paper itself
ARPANET sent its first message on 29 October 1969 from a lab at UCLA to a machine at Stanford, and the message was supposed to read ‘LOGIN’ — but the system crashed after the L and the O, meaning the first word ever transmitted over the network that became the internet was, by accident, ‘LO’.
In 1995, Microsoft shipped a cartoon-house interface called Bob, led by Melinda French, who married Bill Gates while it was in development — it demanded twice the memory of a typical home PC, sold roughly 30,000 copies, and was dead within a year, leaving behind the font Comic Sans and the animated assistant that became Clippy.
The Greenland shark grows about one centimetre a year, does not reach sexual maturity until around age 150, and a specimen carbon-dated by Danish researchers in 2016 was estimated to be at least 272 years old, meaning it was already swimming the North Atlantic when Mozart was composing symphonies.
When Apple shipped iOS 12 in June 2018, a small feature called Screen Time slipped onto every iPhone with a counter nobody had quite prepared for — a tally of pickups — and within a day Tim Cook was telling CNN the number of times he picked up his own phone was simply too many