Saturday, April 9, 2022

How Smart is Dall-E 2?

Prompt: “Polymer clay dragons eating pizza in a boat”
Computer-generated image (Dall-e 2 by OpenAI) 

For a several years now, computers have been able to generate images based on a natural-language prompt. 

The resulting images have suffered from problems of logic and global coherence.

For example, here's what you get if you give the computer the prompt “A rabbit detective sitting on a park bench and reading a newspaper in a Victorian setting.” (Latent Diffusion LAION-400M via @loretoparisi)

Where are his legs? His hands? Are those books or newspapers? Is that a coffee table in front of his bench? 

The image doesn't make sense, and we might conclude that the problem comes from the computer not having any experience of living in a body or dealing with the real world. No matter how big the data sets, or how many layers of processing you bring to the task, you can't get past that limitation. 

Or can you? 

Open AI is one of the pioneers of generating realistic images and art from descriptions in natural language. They recently unveiled new software called Dall-e 2, which has pushed the boundaries of what's possible with this technology.

Here's what Dall-E 2 does with the same prompt: “A rabbit detective sitting on a park bench and reading a newspaper in a Victorian setting.” 

The overall logic is much better. Now he has legs and is really sitting on that bench, even casting a shadow. But the image is still not perfect. What's the black loop in his left hand? And why doesn't he seem to be holding the newspaper with his right hand? 

Here's one more example of how the technology is improving, using the prompt “teddy bears working on new AI research on the moon in the 1980s” 

The first version using older tech (laion400m) looks like a paste-up of unrelated elements.

Here's what Dall-e 2 came up with: a pretty believable image with consistent lighting. 

Open AI released this YouTube video to introduce the sofware.

This technology scares some working artists and illustrators. @VividVoid says: "DALL-E is breaking my heart. AI art is about to lay utter waste to traditional visual art forms. This will be so much more destructive than what the Internet did to music. It will be a technological conquest of one of the great human avenues of spiritual transformation."

AI skeptic Gary Marcus doubts whether the technology will ever replace artists because it is just crunching big data sets. It's not learning from embodied experience, nor does it understand symbolic or semantic concepts the way a human does. Marcus says: "This whole thread is weaponized cherry-picked PR; the antithesis of science."

Soon after Dall-E2 was released, OpenAI gave me beta access to try it out. On this YouTube video, I share my first experiments with it. (Link to YouTube)

Read more
Dall-e 2 at OpenAI
Podcast: Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI 


Joel Fletcher said...

Bow down to your robot artist overlords!

Can't wait until this software is available for the Mac. ;-)

waronmars said...

Turns out that ‘make art’ button that digital artists have joked about for years is real!
Seriously though, I am interested in the ethical and legal questions around AI that makes art in a particular artist’s style. I stumbled upon this site through Twitter, which shows a large amount of AI art based on a single prompt plus an artists’s name. Your own name is included, which made me wonder what you think about it!

James Gurney said...

WaronMars, I don't know what website or Twitter account you mean, but I know that my name (among many other artists' names) has been used to stylize prompts. I find that interesting and kind of funny. I'm not quite as thrilled when someone tries to mint it as an NFT.

Joel, which is one of the reasons I'm gravitating more and more to doing videos of myself as a human making art. That's something the robot overlords can't take away from me.

Melinda said...

Maybe the loop is a monocle?

nuum said...


I've read this comment on OpenAI YT page and I think it's interesting:

"Sometimes when I read a novel, lack of images that I have in mind limits my imagination greatly and it could make me quit reading.
Dall-E 2 can provide a lot of images for the story and help people imagine far much better... reading a book full of good imaginations provide us more fun than a movie."(Eric Cartman)


MerylAnnB said...

I'm fascinated, and agree that this can be a good tool for artists to use for short cuts. I am thinking especially of having a few illustrations done for a project, which are added to the database, and then asking for the new illustration "in the style of."

But who owns the copyright? Copyright law says the creator of the image. Is that the AI?

Or, if asking for an illustration "in the style of James Gurney" would that mean that James Gurney owns all the copyrights on any art made in his name? Maybe that's a bonus!

Copyright Law ia already dancing around a gray area as far as images are concerned, maybe it just got grayer. Middle value gray, LOL.

Joel Fletcher said...

Yes indeed James, any artist that works with traditional media is making something superior to this AI stuff. Art made with mind, hands, and heart will always be better. As impressive as this AI art is, it is still just pixels. Hard to compare to an original painting made with real materials, made with human love.

broker12 said...

Color me old fashioned, but this is for lazy mechanics . . . a guy who replaces oil filters . . . doesn't really do anything or fix anything. It will never replace art. There is something about "making art" that stirs an artist's "insides." Art and music are the language of the soul. The soul has no words, and thus, art and music are the only ways it can speak. Code stored on a chip will never replace the voice and passion of the soul.

James Gurney said...

MerylAnn, glad you brought that up. I recently read an article saying that AI art can't be copyrighted:
I'm not aware of any rulings or legislation about whether the style associated with a person's name can be copyrighted.

MerylAnnB said...

James, a quote from the article you linked:

"So if someone tried to copyright a similar work by arguing it was a product of their own creativity executed by a machine, the outcome might look different." (ie in this context, "different" seems to mean that it might be copyrightable.)

I'm heavily involved in a website where we constantly must stay as aware as we can about copyrights on images...from my experience plus what I read in your linked article, my tendency is to think that --in this example -- you would own the copyright on any AI image which was specifically directed to be "in the style of James Gurney."

And that would make sense to me (although granted, not all copyright law, or even other laws, seem to be required to always make sense!)

Someone will have to test this out, hopefully not you, but you could be the beneficiary of the test results! ;-)

waronmars said...

Hi James,
I completely forgot to add the link! Here it is:

-Jenna Drummond

Timothy Bollenbaugh said...

I'm not as technologically adept or informed as all of you. Something helpful to consider is Norman Rockwell's opinion of what makes a work go over is that if he loves it and experiences it, and remains true to it, giving the process 110%, then the feeling and meaning will come through, the audience will experience it.
AI will certainly make hoards of stuff people will devour. Maybe our question is whether people will want something they can warm up to and will they be able to distinguish the two. And, can AI someday be informed enough about emotion and sentiment to come imperceptibly close?
Again, I wouldn't speculate, myself. But this comes to mind.

James Gurney said...

Timothy, I think you've put your finger on the core issue. What does art provide us when it is created by a living human being? What is it missing when it's made purely by an algorithm? (I know, humans write the code and the prompts.) I think we'll have to grapple with those questions as this technology develops in its sophistication and its commercial delivery formats.

Gibbigabs said...

I think a lot of people are failing to realize that this is not the equivalent of creating a robot that will automatically generate masterpieces, but more of the visuala-art equivalent of auto-tune. Those people who said “digital art isn’t really art because the computer does everything for you”… well they’re about to be right. Anyone with any idea and 0 skill will be able to create art using Dall-E. For better or worse, that’s where we’re headed now. Hold on to your butts.

D said...

I'm sure you're probably tired of seeing things generated in your name by now, but I got a kick out of this image I generated with Dall-E 2 and enhanced using some older AI tools. Jurassic Park in the style of James Gurney:


James Gurney said...

Thanks, Doug, I got a kick out of it. That one was closer to my style than usual, though I'm not sure about the dinosaur. Seems to be a cross between a tyrannosaur and a ceratopsian.