AI generated podcasts from NotebookLM

I recently played around a bit with Google’s new AI-powered NotebookLM site.

When you upload the documents that are central to your projects, NotebookLM instantly becomes an expert in the information that matters most to you.

Its most interesting feature is its ability to create a 10-minute audio conversation about the material you uploaded, sort of like a podcast episode. It nicely extracts key points from your sources, but pads it with natural but annoyingly banal commentary, like SNL’s NPR parodies.

Still, as podcast lovers know, that kind of natural conversation style can be a very effective way to learn.

So here’s the AI’s audio “deep dive” into the Wikipedia article on Frances Hodgson Burnett:

(The AI apparently thought the asterisks in the text were part of the titles.)

And then here’s the AI’s conversation about an older blog post of mine, My approach to music composition:

Wow, AI bots talking about me, and pretending to be impressed! Amazing! Ha!

I may use the site for something serious in the future, but at the moment, it’s a lot of fun to experiment with.


ETA: Here’s its conversation based on a single sentence: “I don’t know why, but everything seems great!” It manages to blather for 5 minutes about this sentence. Highly amusing.

Trovedex remade

A couple weeks ago I remade my web app Trovedex from scratch. It’s private for now, only for my own use; if you go there, it’ll ask you for a password. You’ll have to settle with this impressive screenshot:

It’s now a simple document manager. You can create documents (HTML pages) and put them in folders. That’s basically it. I wanted something like a wiki, but I wanted to use HTML instead of markdown, and I wanted to see folders and files on the side for easy navigation between pages.

I created the app with the help of AI, which was a fun experience (Claude to be exact). While AI doesn’t do all the work for you, it definitely makes things a lot easier. For instance, I had trouble figuring out React before (a javascript library for building interfaces), so the AI was able to show me how it’s done. Adding some features was also breeze. I could tell the AI: “Let’s add the ability to delete documents.” And it would respond: “Sure, paste this code to your frontend and this code to your backend.” Done! Of course, that’s an easy feature to add. Some features caused a bit more trouble. Trovedex is using the Jodit Editor to edit the HTML on the documents / pages, which Claude had some trouble with now and then, forcing me to do my own debugging.

Claude also had a habit of choosing annoying tools and frameworks. It recommended I use PostgreSQL for the backend database, and Prisma to connect to it. No! I had to tell it to use MongoDB and Axios instead, which seem a lot simpler to me.

Overall though, using the AI made me much more productive.

There are still plenty of features I’d like to add to Trovedex eventually, including the ability to make pages or folders public. That way I can use it to replace “Hanniwiki”, which was a MediaWiki site containing the catalog of all my music and stuff. But the software (which was more sophisticated than I really needed) went out of date, so “Hanniwiki” has been missing for a while now.

I’ll probably open-source the project to GitHub at some point.

Mixing my own music with AI

Suno AI has an “upload audio” feature, allowing users to upload up to 60 seconds of their own content to be extended by the AI. So earlier this month I had some fun feeding it 45-60 clips of my own music and having the AI write lyrics and turn the clips into choir songs. It’s interesting to hear how the AI uses the melodies, chord progressions, and orchestrations provided in its own creations. The lyrics are a bit amateurish, but serviceable; I was too lazy to write my own. I’m calling the project Hannifin x AI. Here’s the first installment, based on my classic piece “Hour by Hour”; the first 60 seconds are from the original piece, while the rest is AI-generated.

I did the same with 18 other of my pieces. Some things I noticed:

  • The AI works best with simple 8-bar melodies, or 4-bar phrases. It doesn’t seem to “parse” weirder phrase structures very well.
  • It’s not very good at extended the input instrumentally, in my opinion; it quickly starts to sound too repetitive. Having it produce lyrics and turning the music into a song seems to work better. (Melodic repetition seems easier to bear with alternating lyrics.)
  • If you want the AI to generate the voice of a choir, feeding it music from the louder, more energetic and melodic parts of a piece seem to work better, especially if it features a prominent string section. Otherwise you’re more likely to get a soloist, and the music it generates is less likely to sound like a natural continuation of the music you provide.
  • For whatever reason, some tracks just seem to work better than others; maybe it depends on how “familiar” the AI is with the melodic and harmonic patterns? For some tracks, it gave me some pleasant results right away. Other times I had to roll the dice over and over to get something acceptable.

There were some pieces I tried for which I could not get any output that I was happy with, including The King’s AssassinThe Moon Dreamed By, and On the Edge of a Dream. And there was one track, Silver Moon Waltz, for which I got a couple songs I was pleased with. Anyway, I’m done trying for now.

As for the video above, I made it with Blender 4.2, which took a little time figuring out, mostly with various tutorials on YouTube. I’m not completely satisfied with the results. What’s supposed to be “dust” looks perhaps too much like snow and moves a bit too fast, and the text looks a bit weird. Turns out trying to create a sort of “drop shadow” effect on text in Blender is pretty much impossible; I had to sort of fake it with compositing cheats, and I’m not sure I did the best job. (I could’ve just put the text on the background picture and used an image editor to create the drop shadow, but I wanted the animated frequency bars to have it too.) Also, the text might be a bit too bright, but I blame that on the VR display I get with Immersed in the Meta Quest 3.

I’ll upload the other 19 songs I created soon!

 

My first AI music album: “The Shadow Age”

I’ve been enjoying writing songs with AI songwriting tool Suno for the past few months, and recently put together a full-length album of some of my favorite tracks so far. While AI wrote the music and provides the performance, I wrote the lyrics, which are very deep and profound. (Though two of the tracks are from old famous poems.) The symphonic metal album is free to download here (ZIP file, MP3 V0, 111.2MB) or on Bandcamp.

Don’t like AI music? Well, I’m sorry, but I’m going to create even more AI albums, bwahaha!

First impressions with the Meta Quest 3 VR headset

As I’ve blogged about before, I’ve had trouble with my programming productivity lately, a major cause being my terrible sitting posture while using my desktop due to the monitors not being situated quite how I’d like, and my chair not optimally supporting my spine. I get a sharp stabbing pain in the back of my neck and between my shoulders after about an hour or so.

I thought about getting a Steam Deck to allow me to play games away from my computer, but, after seeing a few YouTube videos and Twitter posts from people finding comfort while programming in VR, thought that the Meta Quest 3, which was released near the end of last year, might be just what I needed!

So I just got one and am happily writing this post from the comfort of my bed with a wireless keyboard and some giant VR monitors hovering just in front of me.

Overall, I’m loving it, just the sort of thing I was hoping for. Here are some pros and cons I’ve found with the Meta Quest 3 during my first couple days of use.

Pros

The resolution and frame rate are great, much better than the original Oculus Rift I got 8 years ago (2016). That was fun for a bit of gaming, but the resolution was too inferior for any sort of virtual desktop work, and the VR sickness was pretty intense.

With the resolution doubled since then, and improvements made to the lenses (the field of view does not seem quite as wide now), virtual desktops are now usable. It also seems to help with VR sickness. I have explored a few virtual worlds and have experienced no VR sickness whatsoever!

Another pro is that it does not need to be connected to anything. It’s a standalone unit. It also doesn’t need an external camera for positional tracking (as the original Oculus Rift and the PSVR do), and the tracking is pretty much perfect. I can even connect it to my computer for a virtual desktop all through Wi-Fi. This is a great convenience.

The “passthrough” is excellent. The unit has cameras on the front, allowing me to basically see through the unit (albeit at a lower resolution), so I can see my hands, my keyboard, my cat, etc., with no problem. I can even walk around the house with no problem!

I have been especially impressed with VR videos on YouTube, of which I’d love to see a lot more. Not the flat 360-degree videos which just put you in a big flat sphere, but the 180-degree 3D ones, that make it look like people and places are just in front of you. In fact, I’d really love to see an entire movie or play in VR. I would definitely love to even get a VR camera and shoot some stuff at some point.

Cons

The major problem with the Quest 3 is that it is very uncomfortable for me. It comes with simple straps that sandwich your face, the main unit pressing against your eyes and cheeks. It’s made worse for me by my need for glasses. I can wear them in the VR, but, although it improves my view of the VR world, it’s just something else pressing into my face. It’s extremely annoying.

Hopefully this problem can be helped with some accessories, which I’ve purchased but which won’t be delivered for a few weeks. First, I’ve ordered some custom lenses so I’ll be able to see clearly in the VR without having to keep my glasses on. I’ve also ordered a halo strap which should, like the PSVR (which is by far the most comfortable VR headset I’ve yet tried), take the pressure off my face by transferring the weight of the unit to my head instead.

Another con is that, like the Oculus Rift, it gets a bit warm, which is annoying when it’s pressed against your face. Hopefully a halo strap will also help with that.

The unit has a short battery life, around 2 hours, which I’m sure will only get worse over time. I’ve only had my unit for a couple of days, and I’ve already drained the battery three times. I guess I could just keep it plugged in? But that’s a bit of a nuisance. The halo strap I ordered comes with a battery pack, so that should definitely help.

Another con is that the resolution could be even sharper; although it’s now good enough to use virtual monitors, text is still somewhat fuzzy, and there is still some aliasing and shimmering going on. Hopefully in another decade we’ll have even higher resolution VR sets? I still don’t think I’d watch a movie in here; even though I can experience a giant virtual theater, I enjoy the higher resolution of the real world for movies and TV. (Also, the Netflix app for this thing is terrible, it streams at too low a resolution with too much compression.1)

One last con is that the unit is kind of… smelly. It doesn’t have that new plastic computer smell, which is the stuff dreams are made of. Instead it just smells kinda weird, almost like body-odor. It’s admittedly slight, but it’s annoying. Hopefully it’ll go away eventually, but until then I guess I can always light scented candles or some dragon’s blood incense.

(Now I have to write the rest of this post outside of VR, because I drained the battery again.)

The Metaverse

I’m still not at all sold on the whole “Metaverse” concept. Perhaps I’m too much of an introvert, but I don’t see the appeal of exploring a virtual environment with a bunch of strangers’ avatars wandering around in front of me with random chatter from random voices all over. If they were people I knew outside of VR, it could be a fun and interesting experience, but I just don’t want to explore VR worlds with strangers. Sorry strangers. Sorry Mark.

Desktop Use

Right now, I’m using the “Immersed” app, which allows you to cast you computer monitors to VR, and allows you to add additional virtual monitors. With programming, it’s very useful to have at least two: one for the coding, another for seeing the running results. It should be useful to have even more screens to pull up documentation and other resources without having to shrink and hide windows.

Right now I’m just using the free version of the app. I’ll probably try the paid version when my accessories eventually arrive to see if it’s worth the upgrade, but the free version is probably all I need.

Since the visual info is streaming through Wi-Fi, there’s no need for cords, but it does drop frames every now and then, so it’s probably not great for watching videos from the desktop or playing PC games. For that, you’d probably need to physically connect your computer to the VR, which I have not yet tried.

Overall, the Meta Quest 3 gets a big thumbs up for me, despite its cons, which I hope the accessories will help with.

Solar Eclipse!

My parents and I travelled up to Erie, Pennsylvania this weekend to see today’s total solar eclipse. (We missed the 2017 one.) It was awesome! It was a very cloudy day, but fortunately the clouds thinned out enough that when the eclipse reached totality, we could see the “diamond ring” in its full glory. Very interesting to see the faint shades of color around the edges of the moon. The rapidity with which the whole sky becomes dark and light again before and after totality was also awesome to see. I generally hate traveling, but this was worth the trip.

(It was also nice that our hotel gave us a free upgrade from a normal dinky little hotel room to a double bedroom; more spacious, and I got my own room!)

I didn’t spend much time trying to get a good picture as I preferred to just focus on the experience. But here’s the partial before totality, taken through the filter of the eclipse glasses:

And then here’s a terrible picture of the total eclipse as shot through my phone with default settings, blurry and crappy:

There’s really not much else to do around here in Erie, PA. We went to see the shore of the great lake yesterday, and tonight I want to try seeing the new Godzilla x Kong movie at a nearby theater in 3D with D-BOX haptic movement seats, which I’ve never tried; we don’t have any back home.

Prayer to St. Michael with Suno AI

I turned the Prayer to Saint Michael into some epic choir music with Suno AI:

It would have been a lot easier for me to learn my prayers as a kid if it had been so easy to turn them into music.

I actually wanted the whole prayer to be sung by the entire choir, but Suno AI seemed to insist on featuring a solo vocalist for the second part (“May God rebuke him…”), as you can hear above. I also had to try quite a few times to get it to pronounce “wickedness” clearly and correctly; it kept wanting to sing “winess” or “wicks”. But I like how it ended up.

Here are some other versions it come up with, though I didn’t quite like any of them as much as the above.

V3 with the little “….amen!” at the end sounds almost comical.

Anyway, I’ve been thinking about posting some lyric videos of my Suno creations to YouTube. I made the St. Michael video above with Shotcut, but that seems impractical for a video with changing lyrics. Perhaps if I can make a template in Blender, I can use that. But I haven’t played around with Blender in a long time, and I don’t want to spend too much time on it… something to play around with later this month.

For now, it’s almost time for the 2024 eclipse! Though the weather might not be so good… we’ll see…

Fun with Suno: AI Song Generator

Wow, this is my first blog post of the year. That’s pretty sad.

This week I’ve been playing around with Suno, an AI song generator. As far as music-generating AI goes, it’s definitely the best I’ve seen so far, as it actually generates melodies, which is what most musical AIs stink at.

Of course, it’s got its weaknesses, but this is new tech, so that’s to be expected. And I haven’t seen competition that really does anything similar yet, though I’m sure that will come.

Anyway, here are some of the songs I’ve generated with the app. You can have it generate its own generic lyrics, but I find it more interesting to provide my own.

The first three are symphonic metal, one of my favorite genres. Maximus is an epic choir singing in another language. A Song Unsung and The Road Inside are some relaxing indie folk. The Owl and the Dragon is a folk-ish lullaby. A boys’ choir sings The Crystal KnifeAbout the Cats is in the style of a generic 90s pop song. Finally, Boop! is an Irish folk song with nonsense lyrics. Links to the lyrics for each song can be found at the bottom of this post.

Weaknesses

Perhaps the biggest weakness is lack of control. Other than providing the lyrics and style, you don’t really have much control over the details, which you’d likely want if you were a serious composer or songwriter.

Styles are also limited; I asked it for the style of a Russian folk song (“The Owl and the Dragon”), and it just gave the singer a Russian accent.

The format is limited. For best results, it seems good to stick to four-line verses and chorus, from which generates standard generic 8-bar melodies.

It’s text-to-song isn’t perfect. Sometimes it repeats a syllable, ignores a syllable, or puts emphasis on a weird syllable. Sometimes it will sing a line from a verse as though it’s part of the chorus; its “parsing” makes mistakes.

Sound quality is another weakness. You can probably tell from the examples that it outputs some pretty low-quality sounds, especially with the bombastic symphonic metal, which can sometimes make the lyrics hard to understand. But musical sound data has even more information than images, and image AI generators themselves still output a lot of noise. With images, however, it’s easy to discount the noise as texture or something. With musical sound, noise gets in the way; with professional recordings (especially if you’re an audiophile), we’re used to hearing nice clean sounds; even the hissing high frequencies of cymbals matter to a degree.

In some output (not the ones I’ve showcased here), I could swear I could hear overtone artifacts of other words or singers faintly in the background; I’m guessing the AI is doing diffusion with frequencies / Fourier transforms, and generating little fragments of training data it should be ignoring. Or it could just be weird auditory illusions.

Is it useful?

Given all these weaknesses, is Suno a useful tool? Honestly, it’s probably not super useful for professional musicians yet, perhaps other than a quick and easy way to get some ideas. Otherwise, it’s perhaps still more of a toy at its current stage.

Granted, such a musical toy can still be a lot of fun, and I’m excited to see the app develop further. I’m not sure who’s behind it or even what country it’s from, but I do hope they don’t get bought out too easily.

TuneSage

What about my own music AI, the development of which I’ve been procrastinating on? Has Suno beat me to the punch?

My approach is a lot different as I’m not really dealing with the sound of music. My focus with TuneSage is more about the actual notes and musical structures of a piece.

Lyrics

Here are links to each song on Suno, where you can see my profoundly beautiful lyrics:

Close Your Eyes
A True Heart
The Shadow Age
Maximus
A Song Unsung
The Road Inside
The Owl and the Dragon
The Crystal Knife
About the Cats
Boop!

AI and God

AGI, or Artificial General Intelligence, is the holy grail of much AI research. It is an AI that can learn anything, at least anything a human can learn. If we could achieve it, humans would never need to work again, or at least the nature of our work would shift far more dramatically than it ever has in human history.

Some people, particularly AI “doomers” (people who think achieving AGI strongly threatens an apocolypse), seem to believe that if we achieved AGI, it would possess magical abilities to break all encryption or determine objective truths.

My use of the word “magical” reveals what I think about this notion: it is utterly foolish, preposterous, ridiculous, and just plain stupid!

Consider, for instance, the halting problem. Can we write a computer program that takes in another program and tells us whether it will come to a halt, or run forever? Alan Turing proved this to be mathematically impossible. No such program can be written. AGI won’t be able to do it either.

Similar with encryption; AGI will not magically discover number theory impossibilities that suddenly allow all encryption to be broken in a practical amount of time.

AGI will not be able to break mathematical limits that we are already certain of. Why do some people seem to imagine that it will be able to do impossible things like this?

Perhaps the silliest notion of all is that AGI will somehow be able to spit out objective truths, somehow avoiding the ambiguities that result in human intelligences’ conflicting conclusions. Where the heck would such objective conclusions come from? Will it be privy to some magical data that humans cannot perceive? How would it get such data? Will it recognize secret codes in the data we train it with?

Even with human intelligence, we can draw multiple conflicting conclusions from the same data. See my previous post about the meaning of facts (i.e. data). When we come to conflicting conclusions, what do we do? We expirement! If we can, at least. (Otherwise we just argue about it, I guess.) And the point of such experimenting is not to find objective truth, since we can’t, but rather to be able to make useful predictions. Doing this leads to that, so if you want that, do this. And then we build on it. Hmmm, so if this leads to that, does this other related thing lead to that other related thing? Experiment, find out. (On a side note, AGI is, in my opinion, all about figuring out how human intelligence is capable of making that leap from one set of relations to another, or, to put another way, how we are able to generalize predictive relationships. It comes naturally to us (to some more than others), but we have no idea how to program a computer to do it.1)

So Dilbert creator Scott Adams asks some silly questions on Twitter regarding AI and God:

I shall now try to answer these questions:

1. No, because that’s not what God is.

2. Is that a question? Anyway, here Adams seems to be supposing that AI, or AGI, is synonymous with conscious experience itself, which is quite a leap! Even if we believed it, why should that mean anything to a human, whose intelligence is not, by definition, artificial? Finally, I’m not sure what Adams’s understanding of free will is. Free will is the experience of making a conscious choice. It is not (necessarily) the universe’s ability to do something magically undeterministic in a human brain. (For instance, see compatibilism.)

3. Yes; where does Adams think beliefs in souls comes from? For that matter, how would a human know if a robot is “way smarter”? We’d need some way to relate to it, to find meaning in its output.2 But it’s still a non-sequitur to conclude that it would somehow conclude something about the existence of souls based on some necessarily knowable given data, and that such a conclusion would then be objective. One might as well doubt the existence of souls because some “way smarter” atheist says so.

4. How religions are “created”, in the general sense, has nothing to do with faith in them. That’s like doubting the usefulness of a scientific invention by learning how it was invented. Also, is an AI “that never lies” supposed to be the same as an AI that is never wrong? Because that cannot exist, as explained above.

5. How would AI come to such a conclusion? From training data? Or it opens up a spiritual portal to the God dimension?

All these questions seem to be based on a belief that some powerful AI would gain some kind of spiritual perception from data alone.

To be fair, these questions do point to the philosophical conundrums inherent in a materialistic / deterministic understanding of the human brain and its ability to perceive and believe in God. We don’t know how the brain does it. One could say, “Oh, one just gets it from his parents!3” but that is hardly a satisfactory explanation. Firstly, it implies either an infinite regress, which explains nothing, or that some human was the first to create the idea, which just leads back to the initial question of how it was possible for a human brain to do so. Secondly, even if learned, the human brain must have some prior ability to perceive its meaning; where does this come from? How did it form? I ask such questions not to imply that a supernatural cause is required (that’s a separate issue / argument), I’m only pointing out that it’s something we don’t yet understand from a scientific point of view. (And understanding it would not shake one’s faith, anymore than thinking that understanding that two and two is four is manifested as neural signals in your brain makes two and two not actually four. That is, if you are to understand something to be true, it will obviously be reflected in a physical manifestation in your brain somehow.)

Questions of objective truth aside, we could then ask: could a sufficiently advanced AI believe in and perceive God as humans do? It’s certainly an interesting question, but it implies nothing about human belief in and of itself, because, again, it would give us no greater pathway to objective truth.

Finally, to answer Sam Altman (in the tweet Scott Adams was quoting): It’s a tool. You did not create a creature. Don’t flatter yourself!

So those were just some random ramblings on AI and God. I hope you enjoyed. It was all actually written by AI!

Just kidding. But what if it was?!

(Artwork is by AI though, obviously. DALL-E 3.)

TuneSage progress update 10

To my eternal shame, it’s been some months since I made any decent progress on TuneSage. But I’ve been back at it in the last few weeks, trying to tackle the time-consuming problems I’ve been having. Clearly my initial plans were not practical. Here are my current plans:

The AI

I’m vastly simplifying the “AI” element. In fact, I might even stop using “AI” to describe the app altogether. It’s become an overused marketing buzzword in the last couple years anyway. Users will still be able to generate melodies automatically, of course. But the backend will be a lot less complicated.

So I’m rethinking the whole concept of musical styles. My initial plan was simple enough: feed musical examples into a neural network, have it identify styles, and then use it to help write new music in those styles, pairing it with the melody-generating algorithm I already have. But that’s just not working very well, and I’ve spent way too much time fooling around with that approach.

But what exactly is musical style anyway? For melodies at least, we can probably get similar results by simply identifying and using melodic tropes, or signatures, and avoiding melodic rarities for a particular style. And on the melodic level, such tropes are simple enough that they can be identified and implemented without needing to train anything. Instead, we can just say, “hey, melody generator, make this melodic trope more likely to occur in what you generate.” Done. Easy.

Anyway, for the sake of just getting this darn app launched and getting a minimum viable product out there, I think I’m going to ignore styles for now altogether.

The front-end

I’ve been having difficulty figuring out just what the front-end should look like and how it should work.

Firstly, the app will focus, at least for now, only on writing or generating melodies. It won’t be for composing and mixing entire pieces, not at first anyway, unless they’re extremely simple. So, because the paradigm is focused on writing tunes, the traditional piano roll view or the track view, both of which I’ve spent some time putting together, just feel too clunky for editing melodies. The whole point of the TuneSage app is to change the paradigm of composing music, at least melody-wise, so it needs a view / layout designed for that purpose.

So I think I’ve finally come up with something that might work, which I’ll reveal when I get closer to launching (or on Twitch if / when I stream my programming again).

The current to-do list

  • Front-end
    • Buttons for: create new melody, generate melody, delete melody, move melody
    • Set tempo option
    • Allow user to “lock” notes & chords to allow for regenerating only a part or parts of a melody
    • Chordal accompaniment templates (mostly already done)
    • Chord chooser options (mostly already done)
    • Export MIDI / Save / Load options
    • Melody options
      • Time signature (probably only 2/3, 4/4, 3/4, 6/8 to start)
      • Key signature
      • Instruments for melody and chordal accompaniment
      • Volume
    • Play functionality (play, pause, stop)
    • Demo settings (not sure what the limits should be yet… perhaps limited time, no MIDI export, can only create a certain amount of melodies? Also need to find a way to discourage bots.)
  • Back-end
    • Melody generation code (mostly already done)
  • Overall app stuff
    • User login system
    • Terms of service page
    • Subscription service (Stripe?)
    • Create landing page
    • Actually incorporate as a company
    • LAUNCH

I think that’s it. Lots of stuff, but should all be doable, especially as I’m going to stop fooling around so much with the backend AI stuff for now.