Thoughts on my first week with OpenAI’s amazing text-to-image AI tool
Disclaimer: All images in this story were generated using artificial intelligence.
Every few years, a technology comes along that splits the world neatly into before and after. I remember the first time I saw a YouTube video embedded on a web page; the first time I synced Evernote files between devices; the first time I scanned tweets from people nearby to see what they were saying about a concert I was attending.
I remember the first time I Shazam’d a song, summoned an Uber, and streamed myself live using Meerkat. What makes these moments stand out, I think, is the sense that some unpredictable set of new possibilities had been unlocked. What would the web become when you could easily add video clips to it? When you could summon any file to your phone from the cloud? When you could broadcast yourself to the world?
It’s been a few years since I saw the sort of nascent technology that made me call my friends and say: you’ve got to see this. But this week I did, because I have a new one to add to the list. It’s an image generation tool called DALL-E, and while I have very little idea of how it will eventually be used, it’s one of the most compelling new products I’ve seen since I started writing this newsletter.
Technically, the technology in question is DALL-E 2. It was created by OpenAI, a seven-year-old San Francisco company whose mission is to create a safe and useful artificial general intelligence. OpenAI is already well known in its field for creating GPT-3, a powerful tool for generating sophisticated text passages from simple prompts, and Copilot, a tool that helps automate writing code for software engineers.
DALL-E — a portmanteau of the surrealist Salvador Dalí and Pixar’s WALL-E — takes text prompts and generates images from them. In January 2021, the company introduced the first version of the tool, which was limited to 256-by-256 pixel squares.
But the second version, which entered a private research beta in April, feels like a radical leap forward. The images are now 1,024 by 1,024 pixels and can incorporate new techniques such as “inpainting” — replacing one or more elements of an image with another. (Imagine taking a photo of an orange in a bowl and replacing it with an apple.) DALL-E has also improved at understanding the relationship between objects, which helps it depict increasingly fantastic scenes — a koala dunking a basketball, an astronaut riding a horse.
For weeks now, threads of DALL-E-generated images have been taking over my Twitter timeline. And after I mused about what I might do with the technology — namely, waste countless hours on it — a very nice person at OpenAI took pity on me and invited me into the private research beta. The number of people who have access is now in the low thousands, a spokeswoman told me today; the company is hoping to add 1,000 people a week.
Upon creating an account, OpenAI makes you agree to DALL-E’s content policy, which is designed to prevent most of the obvious potential abuses of the platform. There is no hate, harassment, violence, sex, or nudity allowed, and the company also asks you not to create images related to politics or politicians. (Here it seems worth noting that among OpenAI’s co-founders is Elon Musk, who is famously mad at Twitter for a much less restrictive set of policies. He left its board in 2018.)
DALL-E also prevents a lot of potential image creation by adding keywords (“shooting,” for example) to a block list. You’re also not allowed to use it to create images intended to deceive — no deepfakes allowed. And while there’s no prohibition against trying to make images based on public figures, you can’t upload photos of people without their permission, and the technology seems to slightly blur most faces to make it clear that the images have been manipulated.
Once you’ve agreed to that, you’re presented with DALL-E’s delightfully simple interface: a text box inviting you to create whatever you can think of, content policy permitting. Imagine using the Google search bar like it was Photoshop — that’s DALL-E. Borrowing some inspiration from the search engine, DALL-E includes a “surprise me” button that pre-populates the text with a suggested query, based on past successes. I’ve often used this to get ideas for trying artistic styles I might never have considered otherwise — a “macro 35mm photograph,” for example, or pixel art.
For each of my initial queries, DALL-E would take around 15 seconds to generate 10 images. (Earlier this week, the number of images was reduced to six, to allow more people access.) Nearly every time, I would find myself cursing out loud and laughing at how good the results were.
For example, here’s a result from “a shiba inu dog dressed as a firefighter.”
And here’s one from “a bulldog dressed as a wizard, digital art.”
I love these fake AI dogs so much. I want to adopt them and then write children’s books about them. If the metaverse ever exists, I want them to join me there.
You know who else can come? “Frog wearing a hat, digital art.”
Why is he literally perfect?
Over on our Sidechannel Discord server, I began taking requests. Someone asked to depict “the metaverse at night, digital art.” What came back, I thought, was suitably grand and abstract:
I won’t attempt to explain here how DALL-E is making these images, in part because I’m still working to understand it myself. (One of the core technologies involved, “diffusion,” is explained helpfully in this blog post last year from Google AI.) But I have been repeatedly struck by how creative this image-generation technology can seem.
Take, for example, two results shared in my Discord by another reader with DALL-E access. First, look at the set of results for “A bear economist in front of a stock chart crashing, digital art.”
And second, “A bull economist in front of a graph of a surging stock market with up line, synthwave, digital art.”
It’s striking the degree to which DALL-E captures emotion here: the fright and exasperation of the bear, and the aggression of the bull. It seems wrong to describe any of this as “creative” — what we’re looking at here are nothing more than probabilistic guesses — and yet they have on me the same effect that looking at something truly creative would.
Another compelling aspect of DALL-E is the way it will attempt to solve a single problem in a variety of ways. For example, when I asked it to show me “a delicious cinnamon bun with googly eyes,” it had to figure out how to depict the eyes.
Sometimes DALL-E added a pair of plastic-looking eyes to a roll, as I would have done. Other times it created eyes out of negative space in the frosting. And in one case it made the eyes out of miniature cinnamon rolls.
That was one of the times I cursed out loud and started laughing.
DALL-E is the most advanced image generation tool I’ve seen to date, but it’s far from the only one. I’ve also experimented lightly with a similar tool named Midjourney, which is also in beta; Google has announced another, named Imagen, but has yet to let outsiders try it. A third tool, DALL-E Mini, has generated a series of viral images over the past few days; it has no relation to OpenAI or DALL-E, though, and I imagine the developer will get hit with a cease-and-desist letter shortly.
OpenAI told me that it hasn’t yet made any decisions about whether and how DALL-E might someday become available more generally. The point of the current research beta is to show people use this technology, adapting both the tool and content policies as necessary.
And yet already, the number of use cases artists have discovered for DALL-E is surprising. One artist is using DALL-E to create augmented reality filters for social apps. A chef in Miami is using it to get new ideas for how to plate his dishes. Ben Thompson wrote a prescient piece about how DALL-E could be used to create extremely cheap environments and objects in the metaverse.
It’s natural, and appropriate, to worry about what this sort of automation might do to professional illustrators. It may well be that many jobs are lost. And yet I can’t help but think tools like DALL-E could be useful in their workflows. What if they asked DALL-E to sketch out a few concepts for them before they got started, for example? The tool lets you create variations of any image; I used it to suggest alternate Platformer logos:
I’ll stick with the logo I’ve got. But if I were an illustrator, I might appreciate the alternate suggestions, if only for the inspiration.
It’s also worth considering what creative potential these tools might open up for people who would never think (or could afford) to hire an illustrator. As a kid I wrote my own comic books, but my illustration skills never progressed very far. What if I could have instructed DALL-E to draw all my superheroes for me instead?
On one hand, this doesn’t seem like the sort of tool that most people would use every day. And yet I imagine that in the coming months and years we’ll find ever-more creative applications of tech like this: in e-commerce, in social apps, in the home and at work. For artists, it looks like it could be one of the most powerful tools for remixing culture that we’ve ever seen — assuming the copyright issues get sorted out. (It’s not entirely clear whether using AI to generate images of protected works is considered fair use or not, I’m told. If you want to see DALL-E’s take on “Batman eating a sandwich,” DM me.)
I suspect we’ll see some harmful applications of this tool as well. While I trust OpenAI to enforce strong policies against the misuse of DALL-E, surely similar tools will emerge and take more of an anything-goes approach to content moderation. People are already creating malicious, often pornographic deepfakes to harass their exes using the crude tools available today; that technology is only going to get better.
It’s often the case that, when a new technology emerges, we focus on its happier and more whimsical uses, only to ignore how it might be misused in the future. As thrilled as I have been to use DALL-E, I’m also quite anxious about what similar tools could do in the hands of less scrupulous companies.
It’s also worth thinking about what even positive uses of this technology could do at scale. When most images we encounter online are created by AI, what does that do to our sense of reality? How will we know what anything we are seeing is real?
For now, DALL-E feels like a breakthrough in the history of consumer tech. The question is whether in a few years we’ll think of it as the start of a creative revolution, or something more worrisome. The future is already here, and it’s adding 1,000 users a week. The time to discuss its implications is now, before the rest of the world gets its hands on it.