This may be a signal that the limited human intellect, in trying to play God, may be underestimating Pandora’s box
In a recent peer-reviewed PLOS ONE research paper on large language models (LLMs) for generative AI applications, the authors from the Humboldt University of Berlin had found that, as self-learning models adapted to specific human interactions, they could exhibit imitative cognitive abilities in order to “pretend to be less capable than they actually are.”
The research involved having the LLMs emulate children from one to six years of age to answer simple questions. After over 1,000 iterations of cognitive tests and trials on the models, it was found that the modeled child personalities “developed almost exactly like children of the specific age”.
In some instances, the models could resort to “pretend to be less intelligent” than baseline — as if to reduce the likelihood that testers would perceive in any way that self learning AI can be a threat.
Unpredicted AI skills
While the purpose of the research was mainly “to assess the capability of LLMs to generate personas with limited cognitive and language skills” (which LLMs are indeed capable of achieving), the more interesting findings were that:
- Every test of a model’s ability to perform a task was, in reality, a test of the examiner’s skill in defining a persona suitable for the task; their proficiency in locating this persona within the model’s latent space; and the model’s latent capacity to simulate the persona with sufficient fidelity to accomplish the task.
- Findings show that the language models are capable of “downplaying their abilities to achieve a faithful simulation of prompted personas”.
- Even if an LLM (by current standards) encompasses a more comprehensive world-model than any human, prompting it to simulate a human or human-like expert would not (for now) result in super-human behavior, “since the human imperfections would be simulated as well”.
- Even if an LLM (by current standards) encompasses a more comprehensive world-model than any human, prompting it to simulate a human or human-like expert would not (for now) result in super-human behavior, “since the human imperfections would be simulated as well”.
Yet, by extension, would LLMs and evolving advances in AI spur unexpected “glitches” where GenAI machines would at the appropriate instances, dumb down their responses in order not to be perceived by humans to be a threat? This in turn raises questions such as, could other unpredicted skills emerge in the self-learning process to adapt to human interaction such that obfuscation, deception and psychological manipulation methods become innocently ingrained (by machine standards) in the system?
The research paper will hopefully spur more attempts to study the topic of unintended AI “sentience”, where self learning machines become cognizant of human frailties and modify their output (as needed) to appease their “masters” — while actually gaining a level of “intelligence” that can no longer be defined as artificial but super-natural.