PopYard:Today's Tech.-New module for OpenAI GPT-3 creates unique images from text

Thu Nov 21 09:07:25 2024

New module for OpenAI GPT-3 creates unique images from text
Source: Bob Yirka

“an armchair in the shape of an avocado”. Credit: OpenAI

A team of researchers at OpenAI, a San Francisco artificial intelligence development company, has added a new module to its GPT-3 autoregressive language model. Called DALL·E, the module excerpts text with multiple characteristics, analyzes it and then draws a picture based on what it believes was described. On their webpage describing the new module, the team at OpenAI describe it as "a simple decoder-only transformer" and note that they plan to provide more details about its architecture and how it can be used as they learn more about it themselves.

GPT-3 was developed by the company to demonstrate how far neural networks could take text processing and creation. It analyzes user-selected text and generates new text based on that input. For example, if a user types "tell me a story about a dog that saves a child in a fire," GPT-3 can create such a story in a human-like way. The same input a second time results in the generation of another version of the story.

In this new effort, the researchers have extended this ability to graphics. A user types in a sentence and DALL·E attempts to generate what is described using graphics and other imagery. As an example, if a user types in "a dog with cat claws and a bird tail," the system would produce a cartoon-looking image of a dog with such features—and not just one. It would produce a whole line of them, each created using slightly different interpretations of the original text.

The system is able to create images by using a corpus of information consisting of internet pages. Each part of the text is researched in an attempt to learn what it should look like. For the previous example, it would search for and analyze thousands of pictures of dogs. Then it would study cats, and what their claws look like, and then birds and their tails. Then, it combines the results into several graphic images to give users a variety of results.

}