Even the latest player in the market — DeepSeek — has the same vulnerabilities as those found in its competitors, it seems
On 30 Jan 2025, cyber researchers released some findings regarding the generative AI (GenAI) chatbots’ potential for use in creating malicious code and other cyber threats.
Using a technique called “jailbreaking”, researchers had been using prompts specifically crafted to exploit any potential weaknesses and guardrails built into the GenAI chatbots. Successful jailbreaks have far-reaching implications. They potentially enable malicious actors to weaponize large language models (LLMs) for spreading misinformation, generating offensive material or even facilitating malicious activities like scams or manipulation.
With a focus on evaluating the vulnerability to jailbreaking of the latest GenAI chatbot, DeepSeek, the findings were that:
- Like all other GenAI chatbots, DeepSeek was susceptible to manipulation.
- Three methods of jailbreaking had elicited a range of harmful outputs, from detailed instructions for creating dangerous items like Molotov cocktails to generating malicious code for attacks like SQL injection and lateral movement.
- The success of these three distinct jailbreaking techniques suggests the potential effectiveness of other, yet-undiscovered jailbreaking methods. This highlights the ongoing challenge of securing LLMs against evolving attacks.
- As LLMs become increasingly integrated into various applications, addressing these jailbreaking methods is important in preventing their misuse and in ensuring responsible development and deployment of this transformative technology.
- While it can be challenging to guarantee complete protection of any LLM against all jailbreaking techniques, organizations can implement security measures that can help monitor when and how employees are using LLMs. This becomes crucial when employees are using unauthorized third-party LLMs.
According to Philippa Cogswell, Vice President & Managing Partner (Asia Pacific & Japan), Unit 42, the firm conducting the jailbreaking tests, “we can’t always trust that LLMs will work as they intend — they are able to be manipulated… (and) assume that LLM guardrails can be broken and safeguards need to be built-in at the organizational level… As organizations look to leverage these models, we have to assume threat actors are doing the same — with the goal of accelerating the speed, scale, and sophistication of cyberattacks. We’ve seen evidence that nation state threat actors are leveraging OpenAI (ChatGPT) and Gemini to launch attacks, improve phishing lures, and write malware. We expect attacker capabilities will get more advanced as they refine their use of AI and LLMs and even begin to build AI attack agents.”