Better than JPEG? Researcher Discovers Stable Diffusion Can Compress Images

An illustration of compressionEnlarge / These jagged, colored blocks are exactly what the concept of image compression looks like. Benj Edwards / Ars Technica

Last week, Swiss software engineer Matthias Bühlmann discovered that the popular Stable Diffusion image synthesis model could compress existing bitmap images with fewer visual artifacts than JPEG or WebP at faster rates. high compression, although there are important caveats.

Stable Diffusion is an AI image synthesis model that typically generates images based on textual descriptions (called "prompts"). The AI ​​model learned this ability by studying millions of images pulled from the internet. During the training process, the model makes statistical associations between images and associated words, creating a much smaller representation of key information about each image and storing it as "weights", which are mathematical values which represent what the AI ​​image model knows, so to speak.

When Stable Diffusion analyzes and "compresses" images as weights, they reside in what researchers call "latent space", which is a way of saying that they exist as a kind of fuzzy potential which can be realized in images once they are decoded. With Stable Diffusion 1.4, the weight file is about 4 GB, but it represents knowledge of hundreds of millions of images.

Examples of using Stable Diffusion to compress images. Enlarge / Examples of using Stable Diffusion to compress images.

While most people use Stable Diffusion with text prompts, Bühlmann cut out the text encoder and instead forced his images through Stable Diffusion's image encoder process, which takes a low-precision 512×512 image and transforms it into a higher-precision 64×64 representation of latent space. At this point, the image exists with a much smaller data size than the original, but it can still be scaled up (decoded) to a 512×512 image with fairly good results.

When running the tests, Bühlmann found that images compressed with Stable Diffusion looked subjectively better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows a photo of a candy store compressed to 5.68 KB with JPEG, 5.71 KB with WebP, and 4.98 KB with Stable Diffusion. The stable broadcast image appears to have more resolved detail and less obvious compression artifacts than those compressed in the other formats.

Experimental examples of using Stable Diffusion to compress images. SD results are far right. Enlarge / Experimental examples of using Stable Diffusion to compress images. SD results are on the far right.

Better than JPEG? Researcher Discovers Stable Diffusion Can Compress Images
An illustration of compressionEnlarge / These jagged, colored blocks are exactly what the concept of image compression looks like. Benj Edwards / Ars Technica

Last week, Swiss software engineer Matthias Bühlmann discovered that the popular Stable Diffusion image synthesis model could compress existing bitmap images with fewer visual artifacts than JPEG or WebP at faster rates. high compression, although there are important caveats.

Stable Diffusion is an AI image synthesis model that typically generates images based on textual descriptions (called "prompts"). The AI ​​model learned this ability by studying millions of images pulled from the internet. During the training process, the model makes statistical associations between images and associated words, creating a much smaller representation of key information about each image and storing it as "weights", which are mathematical values which represent what the AI ​​image model knows, so to speak.

When Stable Diffusion analyzes and "compresses" images as weights, they reside in what researchers call "latent space", which is a way of saying that they exist as a kind of fuzzy potential which can be realized in images once they are decoded. With Stable Diffusion 1.4, the weight file is about 4 GB, but it represents knowledge of hundreds of millions of images.

Examples of using Stable Diffusion to compress images. Enlarge / Examples of using Stable Diffusion to compress images.

While most people use Stable Diffusion with text prompts, Bühlmann cut out the text encoder and instead forced his images through Stable Diffusion's image encoder process, which takes a low-precision 512×512 image and transforms it into a higher-precision 64×64 representation of latent space. At this point, the image exists with a much smaller data size than the original, but it can still be scaled up (decoded) to a 512×512 image with fairly good results.

When running the tests, Bühlmann found that images compressed with Stable Diffusion looked subjectively better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows a photo of a candy store compressed to 5.68 KB with JPEG, 5.71 KB with WebP, and 4.98 KB with Stable Diffusion. The stable broadcast image appears to have more resolved detail and less obvious compression artifacts than those compressed in the other formats.

Experimental examples of using Stable Diffusion to compress images. SD results are far right. Enlarge / Experimental examples of using Stable Diffusion to compress images. SD results are on the far right.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow