OpenAI’s new AI models create images from text and classify them better
OpenAI has introduced DALL-E and CLIP, two new generative AI models that can generate images from your text and classify your images into categories respectively. DALL · E is a neural network that can generate images from the wildest text and image descriptions supplied to it, such as “like an avocado armchair”, or “the exact same cat on top as a sketch in the background. “CLIP uses a new training method for image classification, intended to be more accurate, efficient, and flexible across a variety of image types.
Generative Pre-Training Transformer 3 (GPT-3) models from the US-based artificial intelligence company use deep learning to create human-like images and text. You can let your imagination run wild as DALL · E is trained to create diverse and sometimes surreal images depending on your text input. But the model has also raised questions about copyright issues, as DALL-E draws images from the web to create its own.
IA DALLE Illustrator Creates Wacky Images
The name DALLE, as you may have already guessed, is a combination of the surrealist artist Salvador Dali and From Pixar WALL · E. DALL · E can use text inputs and images to create wacky images. For example, you can create “an illustration of a baby daikon radish in a tutu walking a dog” or a “snail made of the harp.” DALL · E is trained not only to generate images from scratch but also to regenerate any existing image in a way that is consistent with the text message or image.
GPT-3 by OpenAI is a deep learning language model that can perform a variety of text generation tasks using language input. GPT-3 could write a story, like a human. For DALL · E, the San Francisco-based artificial intelligence lab created a GPT-3 image by swapping the text with images and training the artificial intelligence to complete half-finished images.
DALL · E can draw pictures of animals or things with human characteristics and sensibly combine unrelated elements to produce a single image. The success rate of the images will depend on how well the text is written. DALL · E can often “fill in the blanks” when the title implies that the image must contain some detail that is not explicitly stated. For example, the text ‘a giraffe made from a turtle’ or ‘an avocado-shaped chair’ will give you a satisfactory result.
Crop text and images together
CLIP (Contrast Language and Image Pre-training) is a neural network that can perform accurate image classification based on natural language. It helps to more accurately and efficiently classify images into different categories of “raw, highly miscellaneous and very noisy data.” What makes CLIP different is that it does not recognize images from a selected data set, as most existing models do for visual classification. CLIP has been trained in a wide variety of natural language supervision that is available on the Internet. Therefore, CLIP learns what is in an image from a detailed description rather than a single tagged word from a data set.
CLIP can be applied to any visual classification cue point by providing the names of the visual categories that will be recognized. According to the OpenAI blog, CLIP is similar to the “zero triggers” capabilities of GPT-2 and GPT-3.
Models like DALL · E and CLIP have the potential to have a significant social impact. The OpenAI team says they will look at how these models relate to societal issues such as the economic impact on certain professions, the potential for bias in model results, and the long-term ethical challenges that this technology entails.
A generative AI model like DALL · E that selects images directly from the Internet can pave the way for various copyright infringements. DALL · E can regenerate any rectangular region of an existing image on the Internet. And people have been tweeting about the attribution and copyright of the distorted images.