Groundbreaking study reveals how memory attacks on GPUs can severely degrade AI model accuracy and threaten data integrity in cloud environments.
A team of researchers from the University of Toronto has demonstrated a new variant of the RowHammer attack, called GPUHammer, that targets NVIDIA graphics processing units (GPUs) and severely compromises AI model accuracy by inducing bit flips in GPU memory.
This is the first known RowHammer exploit against GPUs, specifically tested on an NVIDIA A6000 with GDDR6 memory. The attack causes malicious users to corrupt other users’ data by flipping bits in GPU memory, degrading AI model accuracy from 80% to less than 1% in deep neural networks.
RowHammer attacks exploit physical vulnerabilities in dynamic random access memory by repeatedly accessing memory rows, causing electrical interference that flips bits in adjacent rows. Unlike CPU-focused RowHammer attacks, GPUHammer exploits the lack of parity checks and instruction-level access controls in GPUs, making their memory integrity more vulnerable.
This new attack vector poses a significant risk to AI infrastructure, especially in shared GPU environments such as cloud platforms, where a malicious tenant could corrupt adjacent workloads without direct access.
NVIDIA has issued an advisory urging customers to enable System-level Error Correction Codes (ECC) to mitigate GPUHammer. ECC can detect and correct bit flips but may reduce performance by up to 10% and decrease memory capacity by 6.25%. Newer NVIDIA GPUs like the H100 and RTX 5090 are not affected due to on-die ECC.
The implications of GPUHammer extend beyond AI model degradation. It introduces new security challenges for cloud computing, edge AI, autonomous systems, and industries with strict compliance requirements, as silent data corruption could violate safety and data integrity standards.
The attack highlights the urgent need for enhanced GPU memory protections, and for ongoing research into hardware-level defenses against evolving RowHammer variants, including architectural redesign, real-time monitoring, regulatory compliance, AI model integrity, and collaborative innovation — to ensure the long-term security and reliability of both AI-centric cloud and edge computing environments.