Sunday, October 8, 2023

A Program That Describes Images

A free and open-source model called Llava lets you upload an image and ask it a question about that image. For example, I asked it what's going on in this one:


"The image depicts a busy street scene with a large yellow robot or machine in the middle of the road, causing a traffic jam. The robot is surrounded by several cars, including a truck, which are stopped or slowed down due to the unusual presence. There are numerous people in the scene, some of them standing near the robot, while others are scattered around the street. A few individuals are closer to the foreground, while others are further back, possibly observing the situation or waiting to cross the street. In addition to the vehicles and people, there are two traffic lights visible in the scene, one on the left side and another on the right side of the road. A McDonald's restaurant can also be seen in the background, adding to the urban atmosphere of the scene."

That's pretty good, but it seemed to miss the flipped car, the crime-scene tape, and the police car. So I tried it again on a very different picture:

"The image is a painting of a man and a woman, both wearing blue shirts. The man is standing on the left side of the painting, while the woman is on the right. The man is holding a paintbrush, seemingly painting the woman's portrait. The woman appears to be looking at the man as he works on the painting. The painting is set in a room with a chair located in the middle of the scene. The man and woman are the main subjects of the painting, and their expressions and postures convey the artist's focus on capturing their likeness."

No deep insights here, but it seemed to get the basics of what I was trying to show. Try Llava on one of your photos or paintings. You can ask "What's going on in this picture?" or a more focused query. It's free and doesn't require a signup.

4 comments:

Knits and Weaves said...

It noticed the MacDonald's but not the Burger King.

squeen said...

And yet, reCAPTCHA "I am not a robot" works by asking you to pick out all the trucks or traffic light?

Clearly that approach no longer is a sufficient test.

Katie said...

Thanks for sharing this interesting tool! I've been using it to evaluate and critique my work and it has given me some helpful answers.

Edward said...

The second description is impressive, but it identifies the easel as a chair.