AI generated podcasts from NotebookLM

I recently played around a bit with Google’s new AI-powered NotebookLM site.

When you upload the documents that are central to your projects, NotebookLM instantly becomes an expert in the information that matters most to you.

Its most interesting feature is its ability to create a 10-minute audio conversation about the material you uploaded, sort of like a podcast episode. It nicely extracts key points from your sources, but pads it with natural but annoyingly banal commentary, like SNL’s NPR parodies.

Still, as podcast lovers know, that kind of natural conversation style can be a very effective way to learn.

So here’s the AI’s audio “deep dive” into the Wikipedia article on Frances Hodgson Burnett:

(The AI apparently thought the asterisks in the text were part of the titles.)

And then here’s the AI’s conversation about an older blog post of mine, My approach to music composition:

Wow, AI bots talking about me, and pretending to be impressed! Amazing! Ha!

I may use the site for something serious in the future, but at the moment, it’s a lot of fun to experiment with.


ETA: Here’s its conversation based on a single sentence: “I don’t know why, but everything seems great!” It manages to blather for 5 minutes about this sentence. Highly amusing.

Mixing my own music with AI

Suno AI has an “upload audio” feature, allowing users to upload up to 60 seconds of their own content to be extended by the AI. So earlier this month I had some fun feeding it 45-60 clips of my own music and having the AI write lyrics and turn the clips into choir songs. It’s interesting to hear how the AI uses the melodies, chord progressions, and orchestrations provided in its own creations. The lyrics are a bit amateurish, but serviceable; I was too lazy to write my own. I’m calling the project Hannifin x AI. Here’s the first installment, based on my classic piece “Hour by Hour”; the first 60 seconds are from the original piece, while the rest is AI-generated.

I did the same with 18 other of my pieces. Some things I noticed:

  • The AI works best with simple 8-bar melodies, or 4-bar phrases. It doesn’t seem to “parse” weirder phrase structures very well.
  • It’s not very good at extended the input instrumentally, in my opinion; it quickly starts to sound too repetitive. Having it produce lyrics and turning the music into a song seems to work better. (Melodic repetition seems easier to bear with alternating lyrics.)
  • If you want the AI to generate the voice of a choir, feeding it music from the louder, more energetic and melodic parts of a piece seem to work better, especially if it features a prominent string section. Otherwise you’re more likely to get a soloist, and the music it generates is less likely to sound like a natural continuation of the music you provide.
  • For whatever reason, some tracks just seem to work better than others; maybe it depends on how “familiar” the AI is with the melodic and harmonic patterns? For some tracks, it gave me some pleasant results right away. Other times I had to roll the dice over and over to get something acceptable.

There were some pieces I tried for which I could not get any output that I was happy with, including The King’s AssassinThe Moon Dreamed By, and On the Edge of a Dream. And there was one track, Silver Moon Waltz, for which I got a couple songs I was pleased with. Anyway, I’m done trying for now.

As for the video above, I made it with Blender 4.2, which took a little time figuring out, mostly with various tutorials on YouTube. I’m not completely satisfied with the results. What’s supposed to be “dust” looks perhaps too much like snow and moves a bit too fast, and the text looks a bit weird. Turns out trying to create a sort of “drop shadow” effect on text in Blender is pretty much impossible; I had to sort of fake it with compositing cheats, and I’m not sure I did the best job. (I could’ve just put the text on the background picture and used an image editor to create the drop shadow, but I wanted the animated frequency bars to have it too.) Also, the text might be a bit too bright, but I blame that on the VR display I get with Immersed in the Meta Quest 3.

I’ll upload the other 19 songs I created soon!

 

Fun with Suno: AI Song Generator

Wow, this is my first blog post of the year. That’s pretty sad.

This week I’ve been playing around with Suno, an AI song generator. As far as music-generating AI goes, it’s definitely the best I’ve seen so far, as it actually generates melodies, which is what most musical AIs stink at.

Of course, it’s got its weaknesses, but this is new tech, so that’s to be expected. And I haven’t seen competition that really does anything similar yet, though I’m sure that will come.

Anyway, here are some of the songs I’ve generated with the app. You can have it generate its own generic lyrics, but I find it more interesting to provide my own.

The first three are symphonic metal, one of my favorite genres. Maximus is an epic choir singing in another language. A Song Unsung and The Road Inside are some relaxing indie folk. The Owl and the Dragon is a folk-ish lullaby. A boys’ choir sings The Crystal KnifeAbout the Cats is in the style of a generic 90s pop song. Finally, Boop! is an Irish folk song with nonsense lyrics. Links to the lyrics for each song can be found at the bottom of this post.

Weaknesses

Perhaps the biggest weakness is lack of control. Other than providing the lyrics and style, you don’t really have much control over the details, which you’d likely want if you were a serious composer or songwriter.

Styles are also limited; I asked it for the style of a Russian folk song (“The Owl and the Dragon”), and it just gave the singer a Russian accent.

The format is limited. For best results, it seems good to stick to four-line verses and chorus, from which generates standard generic 8-bar melodies.

It’s text-to-song isn’t perfect. Sometimes it repeats a syllable, ignores a syllable, or puts emphasis on a weird syllable. Sometimes it will sing a line from a verse as though it’s part of the chorus; its “parsing” makes mistakes.

Sound quality is another weakness. You can probably tell from the examples that it outputs some pretty low-quality sounds, especially with the bombastic symphonic metal, which can sometimes make the lyrics hard to understand. But musical sound data has even more information than images, and image AI generators themselves still output a lot of noise. With images, however, it’s easy to discount the noise as texture or something. With musical sound, noise gets in the way; with professional recordings (especially if you’re an audiophile), we’re used to hearing nice clean sounds; even the hissing high frequencies of cymbals matter to a degree.

In some output (not the ones I’ve showcased here), I could swear I could hear overtone artifacts of other words or singers faintly in the background; I’m guessing the AI is doing diffusion with frequencies / Fourier transforms, and generating little fragments of training data it should be ignoring. Or it could just be weird auditory illusions.

Is it useful?

Given all these weaknesses, is Suno a useful tool? Honestly, it’s probably not super useful for professional musicians yet, perhaps other than a quick and easy way to get some ideas. Otherwise, it’s perhaps still more of a toy at its current stage.

Granted, such a musical toy can still be a lot of fun, and I’m excited to see the app develop further. I’m not sure who’s behind it or even what country it’s from, but I do hope they don’t get bought out too easily.

TuneSage

What about my own music AI, the development of which I’ve been procrastinating on? Has Suno beat me to the punch?

My approach is a lot different as I’m not really dealing with the sound of music. My focus with TuneSage is more about the actual notes and musical structures of a piece.

Lyrics

Here are links to each song on Suno, where you can see my profoundly beautiful lyrics:

Close Your Eyes
A True Heart
The Shadow Age
Maximus
A Song Unsung
The Road Inside
The Owl and the Dragon
The Crystal Knife
About the Cats
Boop!

DALL-E 2 is awesome! I love it!

Warning: Lots of images below!

Earlier this week, I was finally invited to OpenAI’s DALL-E 2 public beta! And I’m completely in love with it. Below are some of my favorite pieces I’ve generated with it by giving it simple text prompts.

First, a few details about the app: Generating these pictures is a computationally intensive process, so they limit how many pictures you can generate. This is done with credits. Upon receiving an invite, they give you 50 free credits to start with. Each credit allows you to send one text prompt, and you get four variations in return. Each month they give you 15 more free credits. However, you can buy credits as well. Currently that price is $15 for 115 credits, which comes to a little over $0.13 per prompt, which really doesn’t sound bad, but it adds up quickly when you get addicted! Still, personally I think it’s totally worth it. Just wish I had more money to spend on it!

Sometimes you get really awesome results, sometimes you get weird abstract nonsense that’s nothing like what you had in mind. So you have to get a feel for what sort of prompts might give you something interesting, and what sort of prompts it won’t understand.

So here’s a little gallery of some of the stuff I’ve created so far. I’ve already spent $30 and it’s only my first week with access, so I will have to restrain myself now. (I still have around 85 credits left.)

Finally, it generates images at a resolution of 1024×1024. I’ve resized the images below in an effort to conserve screen space and bandwidth.

Dolphin eating a cheeseburger

This is similar to a prompt I tried on another AI image generator last year, so I was curious to see how DALL-E would do with the prompt. Much better!

Libraries

My favorite “style” of DALL-E’s output tends to be “oil painting”.

Steampunk owls

Animals wearing headphones

DALL-E tends to draw animals much better than humans, I suppose because they can be a bit more abstract and less structured than a human’s face. (Although note it doesn’t understand that headphones should go on mammals’ ears rather than the sides of their heads, haha.)

Some abstract art

The prompt here was something like “A painting of a giant eye ball sitting in a chair by the fire.”

Portrait of Mozart as various animals

Owls reading books

Painting of Ha Long Bay in Vietnam in the style of Van Gogh

Castles on cliffsides

Starry skies above castles

Flowers growing out of skulls

Money and treasure!

Pirate treasure maps

Skulls on fire

Weaknesses

The above are all cherry-picked examples of some of my favorite outputs so far; some results come out a lot less interesting. DALL-E is particularly not very good with images that require specific structural detail, such as human faces, or pianos, or even dragons. It excels at looser, less-structured forms, such as flowers, trees, and clouds. Below are some examples of output that I was less pleased with, showing some of its weaknesses.

Conclusion

Overall, despite its weaknesses, I’m still completely blown away by the quality of DALL-E’s output. I can’t wait to put some of the images I’ve generated to use as album covers or something! I love it!

Random thoughts on Elon Musk buying Twitter

Some Twitter history

I joined Twitter long ago, in 2007, when it was only about 1.5 years old. You may remember reading about it when I blogged about it long ago. (It was on an earlier non-WordPress version of this blog, which was just called “Blather” rather than “The New Blather”.1) So I have seen the Twitter tides ebb and flow. I remember when Leo Laporte had the most Twitter followers, with an astounding 32,000 of them, wow! Ah, simpler days.

Tweets were far more inane then. There were no hashtags, you couldn’t mention someone, “@” and “#” did nothing, no replies or retweets or quote tweets. You couldn’t post pictures, it was text only. You couldn’t even edit tweets to fix typos. (Oh, wait, you still can’t do that.) You also couldn’t “like” a tweet; instead you could “favorite” it with a star icon, which I would still prefer over the heart.2 It was a big deal when random celebrities or political figures would join.

The tweet prompt used to be “What are you doing?” and you’d simply log what you were up to, where you were at the time, or some other short random thought, just so others could keep up with what was going on with you. It was a fun way to peer into the lives of strangers with similar interests. Smartphones were just beginning to hit the market then; they were hardly ubiquitous, and society was not yet inundated with social media platforms.

I primarily used Twitter for micro-journaling. But as Twitter’s atmosphere drifted from inanity to people having conversations and debates, posting “threads”, brands making announcements and celebrities joining in the fun, I tweeted less and less. I’m just not so interested in the conversational side of things. After 14.5 years on the platform, I’ve collected only 264 followers. Not many. And when I do tweet, it’s usually something still pretty inane. I really don’t have quality content, at least not by most people’s standards. (Unlike this amazing blog!)

Still, it’s generally my go-to social network, mostly because of the accounts I follow. I also like that I can still view my feed in the order things were tweeted instead of being subjugated to some stupid algorithm that chooses what I get to see for me, as Facebook mandates. Granted, Twitter has shadowbanned people, making their tweets mysteriously not show up on my feed, but it’s still better than Facebook. Even if I don’t tweet anything, I usually scroll through my feed anywhere from once to five times a day.

Censorship, free speech, and propaganda

Unfortunately Twitter (like Facebook and YouTube) has a long history of unjustified censorship, the most aggravating among conservatives perhaps being the banning of then president Donald Trump and the censorship of the Hunter Biden laptop story just before the last presidential election. Meanwhile they’ve boosted stories confirming there was definitely no widespread election fraud in the last presidential election, and putting warnings about Covid-19 “misinformation” on tweets linking to certain articles that questioned the government’s position on the virus and the effectivenss of vaccines.

To me, the most grievous censorship as been the suspension of accounts that deny that men can be women (or vice versa) just by saying so and dressing the part, such as the suspension of the Babylon Bee’s account when they tweeted a link to their satirical article: The Babylon Bee’s Man Of The Year Is Rachel Levine. This sort of censorship is the most grievous to me because it punishes a reflection of objective truth (that Levine is not a woman). Everybody knows this truth, yet the Twitter censors partake in a knowing willful denial of it for the sake of some idealized reality (in which everyone just pretends to not know), and the censors actively punish those who do not abide by this objective lie.

This sort of censorship (not to mention all the similar unjustices outside of Twitter surrounding this issue, such as men clobbering women in womens’ sports) is the seed of every dystopian horror, where everyone knows the truth but is forbidden to acknowledge it. The idea that censorship and other methods of idealogical enforcement will somehow just make people slowly and silently change their beliefs about such basic and obvious facts of life as the differences between men and women is the height of arrogance and stupidity. You are just setting up a [figurative] bomb to explode. (Granted, I think some people know that and, for them, that’s the whole point.) It is literally a demonic force.

Go somewhere else?

There are Twitter alternatives, of course. Gab, Parler, and Gettr perhaps being the most prominent, and now Trump’s Truth Social3. They each have their various strengths and weaknesses, but their greatest weakness is that there’s just nobody on them, other than political refugees. And while I don’t mind some political debates and memes in my feed, it’s not the only thing I want to see. I want to see a scientist tweet about his latest book or podcast appearance, or an artist about her latest artwork, or a gamedev about his current programming progress, and those sorts of people are, for whatever reason, still largely only on Twitter4.

Enter Elon Musk

That Elon Musk would buy Twitter is not something I would have ever predicted. I don’t know much about his politics or his business views, and I don’t want a Tesla (not that I could even come close to affording one if I did).

But his views on free speech definitely sound like something Twitter could use. He tweeted:

Free speech is the bedrock of a functioning democracy, and Twitter is the digital town square where matters vital to the future of humanity are debated.

He also tweeted:

By “free speech”, I simply mean that which matches the law.

I am against censorship that goes far beyond the law.

If people want less free speech, they will ask government to pass laws to that effect.

Therefore, going beyond the law is contrary to the will of the people.

Regarding the censorship of the Hunter Biden laptop story, he recently tweeted:

Suspending the Twitter account of a major news organization for publishing a truthful story was obviously incredibly inappropriate

These sentiments definitely get a thumbs up from me.

Future predictions

I did not at all think Musk would ever actually buy Twitter, so what do I know? I predict one of three possibilities:

  1. There’s some contention and debate for a while, but ultimately not much changes for the foreseeable future, except hopefully less unjustified political censorship and annoying propaganda.
  2. Twitter becomes even more popular, with Elon Musk revitalizing the platform with positive features and changes.
  3. Twitter becomes less popular and gradually eats through its funding until it’s sold off again or just withers and dies.

I think that covers all the possibilities, so how can I be wrong? Since I have absolutely no idea what will ultimately happen, any of these outcomes would not surprise me.

The third possibility would really stink. Despite Musk’s good intentions, I’m not sure there’s very much money to be made in Twitter, at least not in its current state. I think much of its funding in past years has been for the purposes of its censorship and propaganda. And although I scroll through my feed quite a bit, I’m not sure there’s content on there I’d be willing to pay for. How much more money is Musk willing to sink into this business venture if needed?

Who knows! But it’s definitely an interesting development. We’ll see what happens!

New music and a Patreon account…

Over the weekend I finished a short composition called A Stargazer’s Lullaby:

As I write in the video’s description:

This piece is part of a short soundtrack for a book series I’m working on called Insane Fantasy. “A Stargazer’s Lullaby” provides the theme for the main character, Coptivon, who’s growing up in a crater in the Crater Lands. There’s not much life out there, but the flat landscapes offer a nice view of the stars. With little else to do in the Crater Lands, Coptivon has memorized all the constellations he could learn. His theme is meant to capture his boredom giving way to fantastical dreams as he gazes at the night stars.

This was my first try at screen recording on my new computer with Nvidia’s ShadowPlay that came with their GeForce GTX970. I’d say it’s definitely the smoothest animation I’ve ever been able to record, so this is definitely the way I’ll do it from now on.

Also, I went ahead a set up a Patreon account here: Sean on Patreon. Of course, funding is always a great help to any artist. This will also allow me to sort-of sell tracks as I finish them rather than having to put out singles on bandcamp or something while I’m saving tracks for an album. Although I’m not really selling them; rather, I’m offering them as a reward for tip-jar donations. Which may amount to the same thing in some people’s eyes, but I’m not sure I’d really consider Patreon that sort of an eCommerce site.

This will also ensure that I release at least two new pieces a month, as I’ll be obligated. (I suppose if something drastic comes up, I can always suspend donations for a month or two, but completing at least two tracks a month won’t be difficult.)

A big thank you to anyone who pledges!

Common story arcs as identified by AI

According to this article:

researchers from the University of Vermont and the University of Adelaide determined the core emotional trajectories of stories by taking advantage of advances in computing power and natural language processing to analyze the emotional arcs of 1,737 fictional works in English available in the online library Project Gutenberg.

The paper can be found on arXiv.org. They discovered six emotional arcs (which also just happen to exhaust all possible alternating binary arcs… in other words, they didn’t really “discover” anything, haha)

1. Rags to Riches (rise)
2. Riches to Rags (fall)
3. Man in a Hole (fall then rise)
4. Icarus (rise then fall)
5. Cinderella (rise then fall then rise)
6. Oedipus (fall then rise then fall)

I’m not sure their results are all that helpful; any experienced storyteller understands this stuff naturally. It is somewhat interesting to see it correspond so strongly to a story’s word usage, though.

I was also interested in their little plot of the emotional arcs in Harry Potter and the Deathly Hollows, which can also be found in this article from The Atlantic. If you check it out, you’ll notice that the second act conforms pretty perfectly to Blake Snyder’s Save the Cat story beats. The first act mirrors this, in terms of there being three main peaks, or three pairs of falls and rises. I’ve started calling these “the three trials”, and most stories tend to conform to this. After the story’s catalyst (or including the story’s catalyst), the story goes through three falls and rises before reaching the “false high” of the midpoint. Many times, a rise will cause a fall in the B story. That is, the plot lines tend to alternate naturally with direction of the emotional arc (though not only at these points, mind you). For example, the hero might, say, punch a bully (rise in plot line A), only to discover his girlfriend wants to break up with him (fall in plot line B).

The “three trials” may be subtle, such as the thematic arguing in the first half of Jurassic Park. (Though if you’re going to make them as subtle as they are in Jurassic Park, the theme better be as interesting as resurrecting dinosaurs. And the characters should actually argue their sides as adamantly as John Hammond and Ian Malcolm; they can’t just stand there and wonder.) I’d identify the three trials of Jurassic Park as:

1. “Life finds a way” – After the thrill (rise) of seeing their first dinosaurs, Ian Malcolm argues the whole thing is bound to end in disaster (fall)
2. “Dinosaurs on your dinosaur tour?” – The guests are excited to start their tour (rise) but fail to actually see any dinos (fall)
3. “Nedry’s betrayal” – The guests are happy to gather around a sickly dino (rise) but as a looming storm forces the tour to be cancelled, Nedry begins his plan of betrayal (fall)

The escape of the t-rex then serves as the midpoint of the film.

OK, that was a tangent, but it’s a good plotting exercise to identify the “three trials” of a story’s first act; I have found it helps a lot in plotting. The arcs of stories that are more “episodic” may not be connected so much, whereas in tighter stories, each rise causes the following fall, and each fall leads to or makes possible the following rise.

(On a side note, it would be interesting to see how film music conforms to these emotional arcs.)

The Atlantic article goes on to mention:

Eventually, he says, this research could help scientists train machines to reverse-engineer what they learn about story trajectory to generate their own compelling original works.

OK, good luck with that. I think emotional-arc mapping should be the least of your concerns if you’re striving for computer-generated stories.

The article writer from the No Film School article, on the other hand, goes on to write:

But I sincerely doubt a computer or AI that we train to write stories will ever be able to find joy, no matter how much emotional value we assign to its database of words.

But, uh…. who cares if the computer can “find joy”? Your role as an audience member, as a consumer of a product, does not necessarily need to include making some emotional connection with the author, as that can only ever be imagined in your own head to begin with. This is similar to the morons who experience an uneasiness listening to computer generated music, as though all this time they were imagining the beauty of music came not from something eternal in nature, but was rather infused into the music by the author’s brain, as though the author created the beauty rather than merely discovered it in the realms of infinite possibility. Does that distinction make sense?

I doubt anyone needs to be concerned about AI storytelling anytime soon though, anyway, as we still don’t quite understand our human ability to use language. We’re much closer to programming a Mozart Symphony Generator (we’re only a fraction of an inch away from that, if not already there). Problem with language programming is that a lot AI researchers try to “cheat”; rather than searching for a deeper understanding of how humans use language, they try to turn it into a simple numbers game, like gathering statistics on word associations. That may be useful for autocomplete functions, but won’t help much with the creation of a serious story, or even a serious paragraph. Words have meanings, and you can’t simply take those meanings for granted, as if they’ll just take care of themselves if you map out word associations enough. We may need to figure out a way to represent those meanings without having to create a bunch of “experiences” for a computer to associate them with, if that’s possible. I have no idea. (And if I did, I would keep it a secret so that I could use it in a grand conspiracy to take over the world, which would fail, but would be turned into a great Hollywood film.)


Another interesting website to fool around with is whatismymovie?, an attempt at creating an AI to help you find an interesting movie. It sometimes comes up with some strange results, but it’s fun to play around with.

Media consumption log

I have started a new blog called Media Consumption Log.  I will post my comments (for they are often too short and inane for me to consider them “reviews”) on the media I consume over there.  My comments on films I’ve watched will also appear over there.  I wanted to keep a log of my own consumption and commentary for my future reference, but I didn’t want to bog down this blog with such posts.  We’ll see how it goes.  I’m also not linking my media consumption blog to Twitter or Facebook, so I won’t be inadvertently spamming friends every time I update it.

Goodreads

I finally created an account on Goodreads not long ago, so if you’re ever wondering what I’m reading or what page I’m on, you can find out there on my account. I’m usually reading multiple books at a time, and I’m a slow reader, so it can take me a while to get through just one book. I’ll be interested to see precisely how long as Goodreads helps me keep track. You can also see most of the books I finished reading since middle school, at least the ones I remember, and some of the books I’m planning on reading at some point. I know the world is very interested in this information, so there it is.