Multiverse Computing releases compressed version of OpenAI

Spanish AI company Multiverse Computing has released HyperNova 60B 2602, a compressed version of OpenAI’s gpt-oss-120B, and posted it for free on Hugging Face.

The new version reduces the memory requirements of the original model from 61 GB to 32 GB, and Multiverse claims that it maintains near-parity tool-calling performance despite the 50% reduction in size.

In theory, this means that a model that once required heavy infrastructure can run with much less hardware. For developers with tighter budgets or energy constraints, this represents a potentially huge advantage.

HyperNova 60B 2602 Multiverse Compute Performance — (Image credit: Multiverse Computing)

CompactifAI technology

Multiverse claims gains in agent-focused benchmarks compared to its previous compressed version. It states that HyperNova 60B 2602 offers 5x improvement over Tau2-Bench and 2x improvement over Terminal Bench Hard.

These tests measure tool usage and coding workflows rather than simple text responses.

The company’s CompactifAI technology restructures transformer weight matrices using quantum-inspired tensor networks.

Multiverse believes that effective compression offers an alternative to simply building larger and larger models, and draws connections to ongoing European discussions on sovereign AI, infrastructure limits and energy consumption. To learn more, I spoke to the company about its compression technology.

Sign up for the TechRadar Pro newsletter to get all the top news, opinions, features and tips your business needs to succeed!

How to compress an LLM?

Multiverse Computing compresses large language models using its proprietary CompactifAI technology, based on quantum-inspired tensor networks.

Instead of simply removing parameters, CompactifAI restructures the internal weight matrices of transformer models into highly efficient tensor network representations. This mathematical reformulation captures correlations between parameters and eliminates structural redundancy.

The process is applied after training, meaning the original model does not need to be retrained and no access to the original training data is required.

Using this approach, CompactifAI can reduce memory usage by up to approximately 93% and significantly reduce the number of parameters, while maintaining strong performance across all tasks.

The resulting compressed models are smaller, faster, more energy efficient, and easier to deploy in cloud, on-premises, and edge environments.

Can you apply it to every LLM?

It works on extensive transformer-based language models, including dense base models, provided that access to the model weights is available.

The technology is architecture agnostic within the transformer family and requires no changes to the external behavior of the model or APIs.

The effectiveness of compression depends on the level of redundancy of the model. Large, over-parameterized models generally offer the greatest compression potential.

What are the challenges?

The main technical challenge is to preserve the accuracy of the model while achieving high compression ratios. This problem is solved by carefully controlling the tensor decomposition parameters to balance size reduction and performance stability.

Another challenge is ensuring that compressed models maintain robustness across different tasks, including reasoning, multilingual performance, and domain-specific use cases.

Finally, deployment environments vary considerably. Compression should be optimized for different hardware targets, latency requirements, and operational constraints.

What could be a good analogy?

Rewrite the plan, not delete bricks: CompactifAI does not simply delete parts of a model. Instead, it rewrites the mathematical plan so that the same structure is represented more efficiently.

It’s like rethinking the internal structure of a building so that it uses far fewer materials while maintaining strength and functionality.

Another analogy is reorganizing massive archives into a highly structured system that eliminates duplication. The knowledge remains intact, but it is encoded much more efficiently.

How do you determine loss of precision?

Loss of accuracy is determined by comparing the compressed model against the original on the same tasks and scoring measures and then measuring the change.

In practice, this includes tool call evaluations. Reducing capacity loss here enables more advanced agent workflows and coding applications.

What other companies (perhaps rivals) are working on the same technique

Multiverse Computing’s compression technique is completely unique, based on quantum-inspired tensor network research led by co-founder and CEO Roman Orus.

Although there are other techniques available to compress AI models, they result in a much higher degree of loss of accuracy.

Since LLMs evolve organically over time, what might be the future of your compression (hardware implementation perhaps?) or something else?

This compression technique can also be applied to upcoming LLMs, which means that in the future, devices such as cars, phones, laptops, etc. will be able to run small or nano AI models pre-installed on their hardware.

Is it hardware independent? Does it work better with certain hardware (ASIC) than others?

Yes, it is hardware agnostic at the model level: CompactifAI compresses the model weights after training, so the resulting model can be deployed in the cloud, on-premises, and at the edge without changing the model’s external interface.

Inference speedups depend on what was limiting you before: if you were limited on memory, a smaller model often runs much faster and cheaper on the same hardware.

It doesn’t require an ASIC, but GPU/AI accelerators typically provide the highest throughput for transformer inference once the model fits comfortably in memory.

What is the compression based on?

CompactifAI relies on redundancy in trained transformer weight matrices: large models are often overparameterized, so the same behaviors can be represented with fewer effective parameters.

Instead of generic “zip” compression, it uses model-aware factorization (quantum-inspired tensor networks) to rewrite large matrices into a smaller, structured form while mitigating the accuracy trade-off.

What stops others from copying your techniques/processes? Analogous to the different compression techniques available (e.g. zip, rar, 7z, etc.)

Multiverse Computing’s proprietary CompactifAI technology is a unique approach to compressing AI models, based on quantum-inspired tensor network research led by co-founder and CEO Roman Orus and the company’s own research team.

What holds back copy techniques is the technical know-how required to achieve such high compression rates without sacrificing accuracy.

CompactifAI can reduce model size by up to 95% with only 2-3% accuracy loss, compared to the industry standard of 20-30% accuracy loss after just 50-60% compression.

CompactifAI – AI Model Compressor – YouTube CompactifAI - AI Model Compressor - YouTube

Look on it

Follow TechRadar on Google News And add us as your favorite source to get our news, reviews and expert opinions in your feeds. Make sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp Also.

Désiré has thought and written about technology over a career spanning four decades. He got into website building and web hosting when DHTML and frames were in vogue and began chronicling the impact of technology on society just before the start of the Y2K hysteria at the turn of the last millennium.

Related