Monday, November 8, 2021

Google Cloud Vision

Google Cloud Vision is a free service that lets you harness the power of machine learning to analyze images. 

You can upload any picture. The algorithm will then compare the image to a vast database of labeled pictures and then make its best guess about what objects it sees. 

In this Tom Lovell illustration, GCV is very certain that it sees a single cat, and it's relatively certain that it sees a person. No mention of the other cat, the knitting, the blue chair and the white sweater. 

What happens if you give it a fantasy image that doesn't exist in the real world, such as a renegade warrior astride a Styracosaurus with a T.rex-tooth-helmet holding a saber-tooth cat skull on a staff? In this Dinotopia image it recognizes two generalized objects: "a person and an animal."

Clicking on the "labels" tab, you can see that it identifies general qualities of the image with decreasing certainty. It's wrong about hunting and it's wrong about a working animal, but it knows that it's an illustration of an extinct animal.

What happens if you input an image that has no analog in the real world because the image was itself generated by a machine-learning algorithm? Can it find something in the DNA of the image that could help it identify the word prompt that generated the image?

This picture was created by (VQGAN+Clip) with the prompt "Constructionist Typography." The properties that it finds are more general than that, but it's in the ballpark.
Try Google Cloud Vision yourself and let me know in the comments what you discover.

1 comment:

Drake Gomez said...

Well, I uploaded a photo of Duchamp's Bicycle Wheel. Google interprets it as a stool (84% likelihood) or a tire (51%), but not art or a sculpture. I suppose depending on your feelings about Duchamp's work, Google Cloud Vision is either not so smart, or very (artificially) intelligent indeed.