There is competition between Microsoft and Google for the generation of images from textual descriptions. If Microsoft has recently opened to a selected niche DALL-E 2Google launched its Imagen: you write what you want and the model creates an image corresponding to what was written.
Imagen was conceived by the Google Brain Team, a research group that basically has carte blanche on the projects to be developed, but which focuses above all on some branches of machine learning: among these also the possibility of obtaining an image from its textual description.
In the case of Imagen, the textual description can be even the most bizarre, such as this: “An alien octopus floats through a portal reading a newspaper. ” The result of Imagen is what you see in the following image.
These models trained on image databases take the English name of “text-to-image diffusion model”, which can be translated into “text-to-image diffusion model”.
A diffusion model is usually a generative model that is used to create data similar to what it is trained on. The most common example is that of graphic noise added to an image followed by the reverse process, so that the model learns to recover the starting image from a seemingly indistinguishable noise.
To ensure that a diffusion model that operates on a text is capable of generating data other than the original data, as for the text-images binomial, data sets consisting of text-image pairs are usually used: that is an image with its textual description.
For now it is not open to the public because it is “dangerous”
Google researchers have realized that you can have excellent results like those of Imagen using pre-trained text-only models, such as Google’s “T5 text-to-text” framework (derived from the five “T’s” in the name “Text-To-Text Transfer Transformer”). This does not examine the words of a sentence sequentially, but only performs a small constant number of steps (chosen empirically) between the words. At each step he applies a self-attention mechanism that directly models the relationships between all words in a sentence, regardless of their respective position.
According to the Brain Team, increasing the size of the language model in Imagen increases both sample fidelity and image-text alignment much more than increasing the size of the image diffusion pattern does.
Results published on the Imagen demo site they are indeed excellent and, to demonstrate the capabilities of the new diffusion model, Google has created a benchmark for evaluating text-image models called DrawBench. Human evaluators preferred Imagen over other models over direct comparisons, both in terms of sample quality and of convergence between image and text. The model was compared with VQ-GAN + CLIP, Latent Diffusion Models and DALL-E 2.
Imagen is currently only accessible in the site demo because, the Brain Team said: “It relies on text encoders trained on untreated web-scale data and thus inherits the social biases and limitations of large language models. As a result, there is a risk that Imagen has encoded harmful stereotypes and representations, which explains our decision not to release Imagen for public use without further guarantees.. “
We want to give thanks to the author of this post for this remarkable content
You write what you want and Google creates the image for you. Imagen’s impressive results
Find here our social media profiles , as well as other related pageshttps://prress.com/related-pages/