Intro
CLIP is a natural language processing model that can learn visual concepts & categorize images (e.g., identifying cats & dogs) from unfiltered & very noisy data. It can basically turn images into text by describing them or match search queries to a database of images.
And then this person decided to use CLIP to train the SIREN network (everything on that page went over my head) to generate images that match a given description; which we now know as Deep Daze. Basically, they reversed image-to-text to make text-to-image.
Aleph2Image is a more recent attempt at this sort of text-to-image generation that uses parts of DALL-E as the generator in conjunction with CLIP.
The plan for this post is pretty much to just generate images of whatever comes into my mind & explore the limitations & possibilities.
Aleph2Image
Prompt: a neon city at night
Prompt: a cloud of smog painted on a canvas
Prompt: a solarpunk warship
Prompt: a rainy cobblestone street
Prompt: a cat wearing a birthday hat
Prompt: a bee listening to jazz
Prompt: a demonic symbol in the sky revealing hell
Prompt: a cafe in a monsoon
I let this run for a bit and when I came back I forgot what the prompt was and had no idea what I was looking at.
Prompt: an anime girl made out of garlic bread
Dall-E Mini
After trying out Aleph2Image, I decided to try Dall-E Mini.
Prompt: a neon city at night
Prompt: a cloud of smog painted on a canvas
Prompt: a solarpunk warship
Prompt: a rainy cobblestone street
Prompt: a cat wearing a birthday hat
Prompt: a bee listening to jazz
Prompt: a demonic symbol in the sky revealing hell
Prompt: a cafe in a monsoon
Prompt: an anime girl made out of garlic bread
Prompt: a euclidean bedroom
Prompt: a non-euclidean bedroom
Prompt: a lavish hotel lobby
Prompt: a cute anime girl
Prompt: a cute anime boy
Prompt: a kobold in a hoodie
Okay, so it took the mythological interpretation rather than the furry version.
Prompt: a cute kobold in a hoodie
No dice.
Prompt: a redditor
Prompt: a sign that says, “ybubbus”
Prompt: an isometric view of a pixelated car
Prompt: the Notre Dame made of human flesh
Prompt: a bottle of water
Was not expecting such an abstract image to come out of this prompt.
Prompt: a violent bottle of water
Not only is this one more recognizable as a bottle of water than the previous prompt, you can even see it trying to replicate the Shutterstock watermark.
Prompt: Francis Bacon in the style of Francis Bacon
Actually quite impressive.
Prompt: Francis Bacon in the style of Francis Bacon in the style of Francis Bacon
Obviously a later Francis Bacon piece.
Prompt: a Pikachu poster
Prompt: Joe Biden’s America
Prompt: Donald Trump’s America
Prompt: Barack Obama’s America
Prompt: banana
Thoughts
Dall-E Mini is much, much faster than Aleph2Image at the expense of resolution. It’s a fair trade-off, though.
Colabs Used
All of this would’ve been impossible on my pathetic PC without the aid of Google Colab and the people who put-together colabs that plebs like me could use.
Colab for Aleph2Image by @advadnoun
Colab for Dall-E Mini by “mega b#6696” on Discord.
Conclusion
We’re a long way from The Aleph.
“I was afraid that not a single thing on earth would ever again surprise me”
— Jorge Luis Borges