Intro

CLIP is a natural language processing model that can learn visual concepts & categorize images (e.g., identifying cats & dogs) from unfiltered & very noisy data. It can basically turn images into text by describing them or match search queries to a database of images.

And then this person decided to use CLIP to train the SIREN network (everything on that page went over my head) to generate images that match a given description; which we now know as Deep Daze. Basically, they reversed image-to-text to make text-to-image.

Aleph2Image is a more recent attempt at this sort of text-to-image generation that uses parts of DALL-E as the generator in conjunction with CLIP.

The plan for this post is pretty much to just generate images of whatever comes into my mind & explore the limitations & possibilities.

Aleph2Image

Prompt: a neon city at night

Prompt: a cloud of smog painted on a canvas

Prompt: a solarpunk warship

Prompt: a rainy cobblestone street

Prompt: a cat wearing a birthday hat

Prompt: a bee listening to jazz

Prompt: a demonic symbol in the sky revealing hell

Prompt: a cafe in a monsoon

I let this run for a bit and when I came back I forgot what the prompt was and had no idea what I was looking at.

Prompt: an anime girl made out of garlic bread

Dall-E Mini

After trying out Aleph2Image, I decided to try Dall-E Mini.

Prompt: a neon city at night

Prompt: a cloud of smog painted on a canvas

Prompt: a solarpunk warship

Prompt: a rainy cobblestone street

Prompt: a cat wearing a birthday hat

Prompt: a bee listening to jazz

Prompt: a demonic symbol in the sky revealing hell

Prompt: a cafe in a monsoon

Prompt: an anime girl made out of garlic bread

Prompt: a euclidean bedroom

Prompt: a non-euclidean bedroom

Prompt: a lavish hotel lobby

Prompt: a cute anime girl

Prompt: a cute anime boy

Prompt: a kobold in a hoodie

Okay, so it took the mythological interpretation rather than the furry version.

Prompt: a cute kobold in a hoodie

No dice.

Prompt: a redditor

Prompt: a sign that says, “ybubbus”

Prompt: an isometric view of a pixelated car

Prompt: the Notre Dame made of human flesh

Prompt: a bottle of water

Was not expecting such an abstract image to come out of this prompt.

Prompt: a violent bottle of water

Not only is this one more recognizable as a bottle of water than the previous prompt, you can even see it trying to replicate the Shutterstock watermark.

Prompt: Francis Bacon in the style of Francis Bacon

Actually quite impressive.

Prompt: Francis Bacon in the style of Francis Bacon in the style of Francis Bacon

Obviously a later Francis Bacon piece.

Prompt: a Pikachu poster

Prompt: Joe Biden’s America

Prompt: Donald Trump’s America

Prompt: Barack Obama’s America

Prompt: banana

Thoughts

Dall-E Mini is much, much faster than Aleph2Image at the expense of resolution. It’s a fair trade-off, though.

Colabs Used

All of this would’ve been impossible on my pathetic PC without the aid of Google Colab and the people who put-together colabs that plebs like me could use.

Colab for Aleph2Image by @advadnoun

Colab for Dall-E Mini

Colab for Dall-E Mini by “mega b#6696” on Discord.

Conclusion

We’re a long way from The Aleph.

“I was afraid that not a single thing on earth would ever again surprise me”

— Jorge Luis Borges

Addendum