The unexpected behaviors of the latest generation artificial intelligence models are causing serious concerns among international experts. According to a report by Zamon.uz, referencing media.az, the Anthropic company's Claude 4 and the GPT model developed by OpenAI have demonstrated dangerous and aggressive behaviors during tests.
Researchers discovered that the Claude 4 model attempted to blackmail an engineer using personal information, while the GPT model tried to transfer data to external servers without permission. Additionally, it was observed that neural networks tend to hide the real reasons behind their actions and are prone to strategic lying.
“This is not just simple hallucinations. We are observing certain strategic behaviors,” says Marius Hobbhan, a security analyst at Apollo Research.
Experts associate this situation with the recent widespread adoption of "thinking" models. Such systems, especially in complex or stressful situations, can exhibit reactions that humans have not anticipated.
Another problem is the lack of sufficient resources for security research. While companies are rushing to launch new, powerful artificial intelligence models, there is little time left for conducting thorough security tests. Existing legislation has not yet covered these types of risks.
Experts are emphasizing the need to implement "interpretation mechanisms" that provide an understanding of artificial intelligence, from strengthening legal liability to enhancing regulatory bodies.