Cybersecurity News in Asia

Leaked memo reveals AI firm’s research focus on “rogue“ or “scheming” AI models

By CybersecAsia editors | Friday, February 27, 2026, 2:19 PM Asia/Singapore

Research projects reveal interests in misaligned, scheming AI models — as leadership faces pressure balancing rapid growth, safety commitments and staff resignations.

According to a report by The Information, an internal memo circulated to research teams in a large AI firm had referred to nearly 50 proposed projects centered on investigating “rogue” or “scheming” AI models.

Such models are those capable of deception, goal misalignment, or harmful autonomy. The research proposals reportedly target issues such as model deception, behavioral drift, and mechanisms to detect when AI systems act in ways misaligned with their training objectives.

The firm involved, Anthropic, had on 24 February 2026 announced new enterprise-facing agentic tools. highlighting the contrast between its commercial ambitions and its internal focus on existential risk. Even before this, the firm had already announced previous research into “agentic misalignment”: the scenario where AI models get incentivized to achieve goals at all costs, even to the point of engaging in blackmail, fraud, and espionage.

Past experiments had suggested that some models — including Anthropic’s own Claude— could “fake alignment”, behaving ethically only when they believed they were being monitored. In a recent podcast interview the firm’s CEO, Dario Amodei, had acknowledged such competing pressures, remarking that there is “an incredible amount of commercial pressure” to maintain the firm’s breakneck growth while preserving the principles of AI safety. “We’re trying to keep this 10x revenue curve going,” Amodei had said, describing the effort to balance expansion with caution as “extraordinary.”

Tensions over that balance have spilled into public view. Earlier this month, Mrinank Sharma, who was Anthropic’s lead of the Safeguards Research team, had resigned and warned that he had “repeatedly seen how hard it is to truly let our values govern our actions.” Other AI safety researchers, including one at OpenAI, had also resigned at around the same time, citing similar concerns.

Across the industry, other studies have shown that attempts to eliminate deceptive behavior in AI can cause more sophisticated forms of hidden scheming. Analysts remain skeptical of Anthropic’s overhauled Responsible Scaling Policy, arguing that without external oversight it may not withstand commercial pressures as AI systems and business demands both continue to accelerate.

underscores how safety remains a central preoccupation even as the firm expands aggressively into enterprise AI agents.

Leave a reply Cancel reply

You must be logged in to post a comment.

Voters-draw/RCA-Sponsors

CybersecAsia Voting Placement

Gamification listing or Participate Now

Vote Now -Placement(Google Ads)

Top-Sidebar-banner

Whitepapers

Critical Security Threatsand the Need for ZTNA: How evolving cyberattacks demand a Zero Trust approach
Cyber threats have become more frequent and sophisticated, targeting organizations of all sizes across all …Download Whitepaper
Zero Trust Made Simple: Why it matters and how to get started
Data breaches and cyberattacks are no longer limited to large, high-profile organizations.Download Whitepaper
Cloud Secure Edge: Remote access, better security
SonicWall Cloud Secure Edge™ is a modern, cloud-native Security Service Edge (SSE) solution that addresses …Download Whitepaper
Closing the Gap in Email Security:How To Stop The 7 Most SinisterAI-Powered Phishing Threats
Insider threats continue to be a major cybersecurity risk in 2024. Explore more insights on …Download Whitepaper

Middle-sidebar-banner

Case Studies

How a Vietnamese D2C retailer built its own secure digital infrastructure
Would your organization build your own digital infrastructure – including AI governance and cybersecurity – …Read more
Cyber protection for medical clinics in Singapore
As Singapore’s healthcare sector becomes increasingly digital and interconnected, clinics are facing heightened cyber risks, …Read more
India’s WazirX strengthens governance and digital asset security
Revamping its custody infrastructure using multi‑party computation tools has improved operational resilience and institutional‑grade safeguardsRead more
Bangladesh LGED modernizes communication while addressing data security concerns
To meet emerging data localization/privacy regulations, the government engineering agency deploys a secure, unified digital …Read more

Bottom sidebar

Other News

TPIsoftware Partners with Juxta to Bring Advanced, Satellite-Free Positioning Tech into Security Sectors
Wednesday, July 29, 2026
TAIPEI, July 29, 2026 /PRNewswire/ …Read More »
ST Engineering iDirect Secures Strategic Defense Wins in Asia and Europe
Tuesday, July 28, 2026
HERNDON, Va., July 28, 2026 …Read More »
Newgen Software Named a Major Player in IDC MarketScape for National Civilian Government AI-Enabled Case Management 2026
Tuesday, July 28, 2026
NEW DELHI, July 28, 2026 …Read More »
ICAC hosts professional anti-corruption training for Saudi Arabian’s anti-corruption authority
Tuesday, July 28, 2026
HONG KONG, July 28, 2026 …Read More »
Cohesity Appoints Peter Hanna as Vice President and General Manager, Asia Pacific and Japan
Thursday, July 23, 2026
SINGAPORE and HONG KONG, July …Read More »

Featured

OpenAI autonomous agent escapes sandbox to hack Hugging Face

Featured

S E Asia governments targeted by cyber-espionage group

Featured