DALL-E 2 is awesome! I love it!

Warning: Lots of images below!

Earlier this week, I was finally invited to OpenAI’s DALL-E 2 public beta! And I’m completely in love with it. Below are some of my favorite pieces I’ve generated with it by giving it simple text prompts.

First, a few details about the app: Generating these pictures is a computationally intensive process, so they limit how many pictures you can generate. This is done with credits. Upon receiving an invite, they give you 50 free credits to start with. Each credit allows you to send one text prompt, and you get four variations in return. Each month they give you 15 more free credits. However, you can buy credits as well. Currently that price is $15 for 115 credits, which comes to a little over $0.13 per prompt, which really doesn’t sound bad, but it adds up quickly when you get addicted! Still, personally I think it’s totally worth it. Just wish I had more money to spend on it!

Sometimes you get really awesome results, sometimes you get weird abstract nonsense that’s nothing like what you had in mind. So you have to get a feel for what sort of prompts might give you something interesting, and what sort of prompts it won’t understand.

So here’s a little gallery of some of the stuff I’ve created so far. I’ve already spent $30 and it’s only my first week with access, so I will have to restrain myself now. (I still have around 85 credits left.)

Finally, it generates images at a resolution of 1024×1024. I’ve resized the images below in an effort to conserve screen space and bandwidth.

Dolphin eating a cheeseburger

This is similar to a prompt I tried on another AI image generator last year, so I was curious to see how DALL-E would do with the prompt. Much better!

Libraries

My favorite “style” of DALL-E’s output tends to be “oil painting”.

Steampunk owls

Animals wearing headphones

DALL-E tends to draw animals much better than humans, I suppose because they can be a bit more abstract and less structured than a human’s face. (Although note it doesn’t understand that headphones should go on mammals’ ears rather than the sides of their heads, haha.)

Some abstract art

The prompt here was something like “A painting of a giant eye ball sitting in a chair by the fire.”

Portrait of Mozart as various animals

Owls reading books

Painting of Ha Long Bay in Vietnam in the style of Van Gogh

Castles on cliffsides

Starry skies above castles

Flowers growing out of skulls

Money and treasure!

Pirate treasure maps

Skulls on fire

Weaknesses

The above are all cherry-picked examples of some of my favorite outputs so far; some results come out a lot less interesting. DALL-E is particularly not very good with images that require specific structural detail, such as human faces, or pianos, or even dragons. It excels at looser, less-structured forms, such as flowers, trees, and clouds. Below are some examples of output that I was less pleased with, showing some of its weaknesses.

Conclusion

Overall, despite its weaknesses, I’m still completely blown away by the quality of DALL-E’s output. I can’t wait to put some of the images I’ve generated to use as album covers or something! I love it!

AI generated images are getting better!

Last year I posted about creating AI art. The website I mentioned, NightCafe, is still around and has added interesting new features, but the images it generates still primarily lean to the abstract side. It doesn’t generate much I would consider of very practical use beyond having fun.

But just a few weeks ago, OpenAI announced DALL-E 2, and the images it generates are much more mind-blowing and exciting. Here’s a brief overview of the tech from Two Minute Papers:

What a time to be alive!

Granted, the examples shown in the video and on OpenAI’s website are cherrypicked. There are some other examples out there that look a bit more wonky. It still doesn’t seem to be great with human faces, for example, or things requiring a lot of finer details, and it’s awful with generating text in images.

Here’s another video describing the tech:

Despite its weaknesses, it still looks enormously more useful, fun, and exciting than the AI image generators I looked at in that post from last year. I of course added my name to the waitlist. I’d love to experiment with it, but I probably won’t get access anytime soon. But DALL-E 2 definitely looks like something I’d be more than willing to pay for (assuming the price isn’t overly expensive). I can at least imagine creating useful images to accompany blog posts, short stories, book or album covers, or something.

Amazing stuff!

ETA: Also check out this mind-blowing art book of 1,000 robot paintings by DALL-E 2 in various styles: 1111101000 Robots

Fun with AI generated art

Over the past week I’ve been having some fun generating bizarre digital art with AI via a couple of websites. You generate the art by simply giving the AI a text prompt, such as “castles in the sky” and, after a couple of minutes, out pops the AI’s somewhat abstract but interesting interpretation:

Castles in the Sky

Since the results are rather abstract, it helps to use words that lack specific forms, such as clouds and landscapes. If you ask for an animal or human, you’re probably not going to get anything that actually resembles their shape, but rather only some abstract colors and textures that resemble them. For instance, here is “dolphins eating sandwiches”:

Dolphins Eating Sandwiches

It also helps to give the AI some hints as to what the result should look like. For instance, the exact prompt for “castles in the sky” above was actually: “An enormous castle floats in the sky beautiful artwork”. Adding the tags “beautiful artwork” help give it a more painterly look.

The art is also limited in resolution; the AI just takes too much memory for larger pics, so the smaller resolutions are the norm.

I’ve been using two websites to create such art:

  1. NightCafe Studio’s AI Art Generator. The site features a very nice user interface, lets you set some optional settings, and allows you to save and share your work while exploring the works of others. It does make you create an account and limits how much you can create with it using a credit system. You can buy credits or earn some. You can check out my profile here: https://creator.nightcafe.studio/u/Seanthebest
  2. NeuralBlender has no user interface or options, but does not seem to limit use. You do have to wait for the AI to finish its current image before starting a new one if you want to see it in your browser.

I have not yet tried it, but if you do a bit of Googling, you can find resources on how to set up your own AI art generator without having to use one of the websites above; the tech is called VQGAN+CLIP and is available to all. A “GAN” is a generative adversial network … and I have no idea what the other acronyms stand for (obviously you can Google that too). So I’m not sure how long the above websites will stay in service considering the tech is not proprietary, nor do I think the AI produces artwork of enough controllable quality to be of widespread use beyond offering an amusing spectacle.

Still, it’s fun to play with. Here are some of my favorites that I’ve generated so far:

Colorful Clouds

Airship

The Sky Is Cracked

Blue Sky at Night

Stone Palace

Library

Library 2

Library 3

UFOs: Are they projections?

I was just randomly thinking about the strange ways in which UFOs seem to defy the laws of physics:

  • Anti-gravity
  • No visual propulsion system
  • Maintain insane speeds
  • Accelerate with insane force
  • Observed in air and water

But what if we’re taking it for granted that the UFO is a unit unto itself?

This isn’t a perfect analogy, but think about shining a laser pointer on the wall. With the rotation of the wrist, we can give the dot of light similar physics weirdness, albeit limited to two dimensions. The point of light doesn’t propel itself, so it doesn’t need a propulsion system. Likewise, insane speeds and accelerations are actually derived from magnified wrist rotations, and so are not nearly as insane as they seem.

Granted, this doesn’t quite explain the anti-gravity. Gravity’s effect on light is negligible from our typical standpoint; it’s generally not until we’re studying blackholes or light across galactic distances through telescopes that gravity’s effect on light becomes measurable. Assuming UFOs are not just light, it seems they’d still have to contend with gravity. Still, maybe whatever is “projecting” them provides the force needed to hold them aloft.

The analogy also breaks down dimensionally. The dot of light from a laser pointer projects onto a wall and bounces to our eyes. Without the wall, the light would just keep travelling and disappating into space. If UFOs are projections, what are they projecting onto? (This is also the obvious challenge of developing the sort of 3D hologram systems often seen in movies like Star Wars. How can we seemingly get light to reflect at a specific point in empty space?) And, for that matter, how can one project actual physical matter rather than just light?

I have no clue, I just thought it was an interesting idea.

Regardless, I’d love to know what their weird UFO things are and what exactly they’re doing out there.

Jurassic Park as plotted by AI

Lately I’ve been fooling around with play.aidungeon.com, particularly its “Dragon” model, which is perhaps based on GPT-3 (though I’m not sure). While the app is biased towards generating second-person adventure game text, I have found it fun to feed it some plot summaries and let it generate a continuation. The results are nonsense, illogical, and inconsistent, but funny.

In regards to story writing, the app can be a lot of fun for generating random ideas, but it’s just about useless (so far as I can tell) for generating appropriately constrained ideas, which are far more important to story writing. Stories, after all, have to go somewhere. Plots develop, characters develop, tensions rise and fall, etc. With only random ideas, the story just kind of meanders around randomly. Perhaps some of its pointless meandering can be tamed with proper prompting, but I have not yet found an effective strategy to achieve this. Perhaps future apps will be better designed for such a purpose, but perhaps it is just an inherent limitation with the AI model itself.

Below are the results of one of my expirements with having the app continue the plot of Jurassic Park. The first three paragraphs are copied from Wikipedia’s plot summary of the film, and the rest is from the AI. You can see that it makes very little sense, and would make a pretty terrible movie, but I found it humorous.

Another experiment involved having it continue the plot of the film Clue, in which Mr. Boddy’s ghost terrorizes the guests and Mrs. Peacock kills Mr. Green with an icicle. In yet another, the AI continued a biography of Mozart by having him get married at age nine and grow up to play pianos in taverns and murder his cousins.

Here’s the Jurassic Park plot continuation (the first three paragraphs are the prompt):

Industrialist John Hammond has created a theme park of cloned dinosaurs, Jurassic Park, on Isla Nublar, a Costa Rican island. After a dinosaur handler is killed by a Velociraptor, the park’s investors, represented by lawyer Donald Gennaro, demand that experts visit the park and certify its safety. Gennaro invites mathematician and chaos theorist Ian Malcolm, while Hammond invites paleontologist Dr. Alan Grant and paleobotanist Dr. Ellie Sattler. Upon arrival, the group is shocked to see a live Brachiosaurus.

At the park’s visitor center, the group learns that the cloning was accomplished by extracting dinosaur DNA from mosquitoes preserved in amber. DNA from frogs was used to fill in gaps in the genome of the dinosaurs. To prevent breeding, all the dinosaurs were made female. Malcolm scoffs at the idea, saying that it will inevitably break down. The group witnesses the hatching of a baby Velociraptor and visits the raptor enclosure. During lunch, the group debates the ethics of cloning and the creation of the park, and Malcolm warns about the implications of genetic engineering.

The group is joined by Hammond’s grandchildren, Lex and Tim Murphy, for a tour of the park, while Hammond oversees the tour from the control room. The tour does not go as planned, with most of the dinosaurs failing to appear and the group encountering a sick Triceratops. It is cut short as a tropical storm approaches Isla Nublar. Most of the park employees leave for the mainland on a boat while the visitors return to their electric tour vehicles, except Sattler, who stays behind with the park’s veterinarian to study the Triceratops.

[AI’s continuation]

Meanwhile, Malcolm enters the control room to find that the computer system has been sabotaged. He hacks into the system and finds that Hammond is dead and his son Michael has taken over. As he attempts to escape, he runs afoul of an angry Tyrannosaurus rex who chases him through a tunnel leading out of the building. He returns to his car where he is attacked by two more T-Rexes.

Malcolm manages to drive away, but is pursued by a third T-Rex. A fourth one follows, which knocks him off the road and onto a rocky hillside. He calls out for help and is rescued by Ellie Sattler, who drives up alongside his car and opens fire on the pursuing Tyrannosaurs. She kills them both with her rifle before they can reach Malcolm again.

Back at the visitor center, Malcolm tells Ellie that she saved his life and that he owes her a debt of gratitude. She asks if he wants to come back to the island with her, but he declines. He says there are things here he needs to do first. Meanwhile, Hammond’s grandson Lex has returned from the mainland and reports what happened to his father. While he’s explaining the situation, Hammond notices through a window that one of the rival bioengineers, Ed Regis, is in his proximity.

Hammond rushes out and shoots Regis multiple times before being restrained. He declares that they are at war, before ordering the computer system to be destroyed with a flamethrower.

The film then cuts to the outside of the visitor center, where there is a scream of pain from Ian Malcolm as the building explodes.

The following morning, law enforcement arrives on the island. Gennaro briefs them on the situation. While searching for survivors, they discover that all of the dinosaurs have been killed. The footprints discovered near the visitor center belong to a henchman working for the island’s owner, Peter Ludlow. Ludlow had learned that Hammond was dead and believed that he could get away with claiming the island. He also realized that he was trapped there with several hostile dinosaurs.

The second coup happens in the hospital room where John Hammond lies dying. His granddaughter Kelly has taken over his role as head of InGen and has accelerated the timetable for building a new Jurassic Park in San Diego. However, a police sniper takes her out from a nearby building, mistaking her for Ludlow. The blame shifts to Ludlow. The actual culprit is Ed Regis, who storms the room.

After killing Ludlow with a gunshot to the face, he shouts out that he didn’t kill anyone and that Ian Malcolm was the murderer before escaping via helicopter. With InGen destroyed, Hammond dead, Ludlow dead and Regis on the run, a third shadowy group is suggested as being responsible for the attacks. The investigation suggests that they have no known motive or target.

The book ends with ‘DNA donkeys’, a note about the next entry in the series Tey cats and a reminder that there are fourteen new engravings made out to readers of this book.

I still want AI that can help me write a novel

Just last month, OpenAI1 released a paper about their results with GPT-3, an AI human language model system which can be trained on many mountains of text, and then generate its own text based on given prompts.

It looks quite impressive! Impressive too are some of GPT-3’s creative writing results from freelance writer Gwern Branwen. Still has plenty of weaknesses, in humor and logic for example, so it won’t be replacing novelists yet, but I’m particularly impressed with GPT-3’s continuation of a scene from a Harry Potter fanfiction. I wouldn’t copy and paste the results, but it looks like it would be great for generating story ideas, both in a novel’s overall plotting stage, and at the actual scene-writing stage. I find the scene-writing stage to be the most tedious and mentally demanding (hence why I’ve procrastinated on doing it for a few years now); I would love to have a program that continually generated ideas for directions a scene could go, either by having it generate a possible continuation or answering prompts with ideas, such as “How might this character respond to this situation?”.

Other possibilities with GPT-3 (or future models) are equally exciting. I’d love to see GPT-3 or something like it applied to things like:

  • Dialog for non-player characters in video games
  • Cohosting a podcast with me
  • Generating comments for this blog so it looks like I have more readers
  • Being an imaginary friend because I’m sad and lonely

One weakness of GPT-3 (and most neural-network based AI for that matter) is that we may not be able to see directly how it generated its answers to prompts. That is, how do we know it’s not plagiarizing or stealing too many ideas from its training data? It may become a thorny issue for some uses.

David Cope’s older algorithmic music generating system, for example, had similar problems. This is I believe 20-something years old, but here’s a computer-generated piece in the style of Mozart:

Sounds great, but if you’re familiar with Mozart, it’s actually not that impressive; there’s just too much Mozart that’s been too directly copied; it’s just not “creative” enough. A patron of Mozart would likely be dismayed, “this is just a rehash of this and that symphony; I want something in your style, but more fresh!”

I doubt GPT-3 always copies from its training data that overtly, but the possibility could still be a problem.

The other big problem, from my perspective at least, is cost. GPT-3 requires too much computer power that I can’t afford to pay for. OpenAI will probably target enterprise users for their first customers, not poor novelists.

There will probably be other options though. For example, there is the recently launched InferKit which I believe is based on GPT-2. Maybe I’ll experiment with that as the pricing seems fair enough, but my previous creative fiction results with GPT-2 weren’t great, especially when it would have characters from other copyrighted novels like Gandalf pop into scenes. I probably just have to hone in on some good methods for idea-prompting.

Anyway, the future of AI continues to fascinate and excite me!

My Solution to the Collatz Conjecture

UPDATE: (August 2021) While I have yet to see any errors in the proposed solution below, it is admittedly incomplete, as it glosses over the final supposition involving the impossibility of loops. An attempt to rectify this missing piece is forthcoming.


As promised, here’s my attempted solution to the Collatz Conjecture. My solution is pretty simple, so if you understand the conjecture, you should understand the proof. (I’m not a pro mathematician anyway, just an amateur hobbyist.) I’m eager to get feedback, especially if I somehow missed something subtle (or worse, something really stupid).

PDF of my proof: click here.

If you prefer to watch a video instead, I’ve uploaded myself explaining my solution here:

Here’s to hoping my proof is confirmed!

The Collatz Conjecture

I’ve been tinkering with the Collatz Conjecture on and off for a couple years; it’s madly addicting, patterns within patterns within patterns, and yet strange and puzzling disorder seems to lurk around every corner.

I have an attempted proof which I’ll type up and post hopefully this week along with a video, unless I find some glaring mistake while doing so. And then I can get back to programming.

Common story arcs as identified by AI

According to this article:

researchers from the University of Vermont and the University of Adelaide determined the core emotional trajectories of stories by taking advantage of advances in computing power and natural language processing to analyze the emotional arcs of 1,737 fictional works in English available in the online library Project Gutenberg.

The paper can be found on arXiv.org. They discovered six emotional arcs (which also just happen to exhaust all possible alternating binary arcs… in other words, they didn’t really “discover” anything, haha)

1. Rags to Riches (rise)
2. Riches to Rags (fall)
3. Man in a Hole (fall then rise)
4. Icarus (rise then fall)
5. Cinderella (rise then fall then rise)
6. Oedipus (fall then rise then fall)

I’m not sure their results are all that helpful; any experienced storyteller understands this stuff naturally. It is somewhat interesting to see it correspond so strongly to a story’s word usage, though.

I was also interested in their little plot of the emotional arcs in Harry Potter and the Deathly Hollows, which can also be found in this article from The Atlantic. If you check it out, you’ll notice that the second act conforms pretty perfectly to Blake Snyder’s Save the Cat story beats. The first act mirrors this, in terms of there being three main peaks, or three pairs of falls and rises. I’ve started calling these “the three trials”, and most stories tend to conform to this. After the story’s catalyst (or including the story’s catalyst), the story goes through three falls and rises before reaching the “false high” of the midpoint. Many times, a rise will cause a fall in the B story. That is, the plot lines tend to alternate naturally with direction of the emotional arc (though not only at these points, mind you). For example, the hero might, say, punch a bully (rise in plot line A), only to discover his girlfriend wants to break up with him (fall in plot line B).

The “three trials” may be subtle, such as the thematic arguing in the first half of Jurassic Park. (Though if you’re going to make them as subtle as they are in Jurassic Park, the theme better be as interesting as resurrecting dinosaurs. And the characters should actually argue their sides as adamantly as John Hammond and Ian Malcolm; they can’t just stand there and wonder.) I’d identify the three trials of Jurassic Park as:

1. “Life finds a way” – After the thrill (rise) of seeing their first dinosaurs, Ian Malcolm argues the whole thing is bound to end in disaster (fall)
2. “Dinosaurs on your dinosaur tour?” – The guests are excited to start their tour (rise) but fail to actually see any dinos (fall)
3. “Nedry’s betrayal” – The guests are happy to gather around a sickly dino (rise) but as a looming storm forces the tour to be cancelled, Nedry begins his plan of betrayal (fall)

The escape of the t-rex then serves as the midpoint of the film.

OK, that was a tangent, but it’s a good plotting exercise to identify the “three trials” of a story’s first act; I have found it helps a lot in plotting. The arcs of stories that are more “episodic” may not be connected so much, whereas in tighter stories, each rise causes the following fall, and each fall leads to or makes possible the following rise.

(On a side note, it would be interesting to see how film music conforms to these emotional arcs.)

The Atlantic article goes on to mention:

Eventually, he says, this research could help scientists train machines to reverse-engineer what they learn about story trajectory to generate their own compelling original works.

OK, good luck with that. I think emotional-arc mapping should be the least of your concerns if you’re striving for computer-generated stories.

The article writer from the No Film School article, on the other hand, goes on to write:

But I sincerely doubt a computer or AI that we train to write stories will ever be able to find joy, no matter how much emotional value we assign to its database of words.

But, uh…. who cares if the computer can “find joy”? Your role as an audience member, as a consumer of a product, does not necessarily need to include making some emotional connection with the author, as that can only ever be imagined in your own head to begin with. This is similar to the morons who experience an uneasiness listening to computer generated music, as though all this time they were imagining the beauty of music came not from something eternal in nature, but was rather infused into the music by the author’s brain, as though the author created the beauty rather than merely discovered it in the realms of infinite possibility. Does that distinction make sense?

I doubt anyone needs to be concerned about AI storytelling anytime soon though, anyway, as we still don’t quite understand our human ability to use language. We’re much closer to programming a Mozart Symphony Generator (we’re only a fraction of an inch away from that, if not already there). Problem with language programming is that a lot AI researchers try to “cheat”; rather than searching for a deeper understanding of how humans use language, they try to turn it into a simple numbers game, like gathering statistics on word associations. That may be useful for autocomplete functions, but won’t help much with the creation of a serious story, or even a serious paragraph. Words have meanings, and you can’t simply take those meanings for granted, as if they’ll just take care of themselves if you map out word associations enough. We may need to figure out a way to represent those meanings without having to create a bunch of “experiences” for a computer to associate them with, if that’s possible. I have no idea. (And if I did, I would keep it a secret so that I could use it in a grand conspiracy to take over the world, which would fail, but would be turned into a great Hollywood film.)


Another interesting website to fool around with is whatismymovie?, an attempt at creating an AI to help you find an interesting movie. It sometimes comes up with some strange results, but it’s fun to play around with.