Original Source Here
For example, if you have a database of images, you could run each image through the image encoder to get a list of image embeddings. If you then run the phrase “puppy on a green lawn” through the text encoder, you can find the image that best matches the phrase.
As mentioned above, the GANshare system creates images that are steered by CLIP using text prompts. If you tell VQGAN+CLIP to, say, create a painting of an “Abstract painting of circles in orange,” it will make one.
In order to crank out a lot of paintings, I generated prompts with three varying parts: a style, a subject, and a color.
After experimenting, I found that these nine styles work reasonably well: Abstract, Cubist, Expressionist, Fauvist, Futurist, Geometric, Impressionist, Postmodern, and Surrealist.
For the subject, I choose from three categories: geometric shapes, geographic features, and objects. I started with Ivan Malopinsky’s word lists for my lists of shapes and geographic features and tweaked them a bit. And for my list of things, I combined the lists of objects recognized by the COCO and CIFAR 100 detection systems to get a list of 181 objects.
I grabbed an extensive list from Wikipedia for the color names and edited it down a bit to get 805 unique colors.
Here are the first seven entries in the four lists.
shapes.txt places.txt things.txt colors.csv
angles an archipelago an airplane absolute zero
blobs an atoll an apple acid green
circles a beach apples aero
cones a bay an aquarium fish aero blue
cubes a butte a baby african violet
curves a canal a backpack alabaster
cylinders a canyon a banana alice blue
... ... ... ...
Here is a link to the Python code that generates a prompt by randomly choosing a style, subject, and color.
Here are some prompts generated by the code.
Futurist Painting of a City in Vivid Burgundy Brown
Abstract Painting with Diagonals in Beige Pink
Impressionist Painting with Prisms in Carolina Blue
Now that we have some interesting prompts, we’ll see how we can steer VQGAN to generate corresponding images next.
Steering VQGAN with CLIP
For my MAGnet project, I used a custom generative algorithm to have CLIP steer a variant of StlyeGAN2 to create images from text prompts. For this project, I am using an algorithm designed by Katherine Crowson, an AI/generative artist who posts on Twitter as RiversHaveWings. To steer VQGAN with CLIP, she uses an optimizer in the Pytorch library, Adam, which stands for Adaptive Moment Estimation . Below is a diagram of the algorithm.
Note that there are two embedding spaces in play here. The CLIP system uses a flat embedding of 512 numbers (represented as I and T), where the VQGAN uses a three-dimensional embedding with 256x16x16 numbers, represented as Z.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot