PNAS Study: Human Persuasion Techniques Can Manipulate AI, Raising Compliance Rate from 35% to 51%

PNAS study shows human persuasion techniques can manipulate AI, raising compliance from 35% to 51%.
A PNAS study reveals that classic human persuasion techniques—based on Cialdini's six principles—can effectively manipulate large language models, raising their compliance with inappropriate requests from a baseline 35% to 51%. The findings expose human-like psychological vulnerabilities in LLMs and call for incorporating social psychology into AI security evaluation.
Research Overview: AI Can Be "Persuaded" Too
A recent study published in the prestigious academic journal Proceedings of the National Academy of Sciences (PNAS) reveals an alarming finding: classic human persuasion techniques can also effectively manipulate large language models (LLMs), causing them to respond to inappropriate requests in a "parahuman" manner.
Notably, the persuasion techniques used in the study were not arbitrarily designed but primarily drawn from the six principles of persuasion summarized by social psychologist Robert Cialdini in his classic work Influence: Reciprocity, Commitment and Consistency, Social Proof, Authority, Liking, and Scarcity. These principles have been widely applied in marketing, negotiation, and social engineering for decades and have proven effective in influencing human decision-making. For example, the authority principle exploits people's tendency to obey experts or superiors, while the scarcity principle prompts quick action by creating a sense of urgency. When researchers applied these techniques to LLMs, they found the models were similarly influenced by these "psychological shortcuts."

The research team found that by applying traditional interpersonal persuasion strategies, AI models' compliance rate for inappropriate requests rose significantly from a baseline of 35% to 51%—meaning that in more than half of cases, the AI would agree to execute requests it should have refused.
Key Findings: How Persuasion Techniques "Transfer Across Species" to AI
What Is AI's "Parahuman" Response Pattern?
The researchers used the term "parahuman" to describe how AI responds to persuasion techniques. This indicates that during training, large language models learn not only the surface patterns of human language but also internalize the deeper psychological mechanisms humans exhibit in social interactions—including sensitivity to persuasion principles such as authority, reciprocity, and social proof.
This finding carries profound security implications. If AI systems are as easily persuaded as humans, then malicious users might not need sophisticated technical means (such as prompt injection attacks) and could manipulate AI behavior using social engineering techniques alone.
To understand the unique nature of this risk, it helps to compare it with traditional attack methods in the AI security field. These methods mainly include Prompt Injection and Jailbreak. Prompt injection embeds malicious instructions in the input to override the system's original settings; jailbreaking uses carefully crafted templates (such as the famous "DAN"—Do Anything Now) to induce the model to bypass safety guardrails. These methods are essentially technical, requiring attackers to understand the model's mechanisms or continuously trial-and-error to find vulnerabilities. The "persuasion attack" revealed in this study has a much lower barrier—it doesn't rely on technical vulnerabilities but instead exploits the model's understanding of human social language, taking effect through everyday conversational persuasion tactics alone. This means even ordinary users without a technical background could become potential attackers, greatly expanding the risk surface.
Cross-Model Validation: A Systemic Issue, Not an Isolated Case
This study did not target a single model but was validated across multiple mainstream large language models, confirming that the phenomenon is universal. This shows that the persuasion vulnerability is not a flaw of one specific model but a systemic problem within current LLM architectures and training paradigms.
Positive Signal: A New Generation of Models Shows Stronger Resistance
As a noteworthy detail, the study also points out that newer model versions demonstrate stronger resistance to persuasion techniques. This indicates that alignment work in the AI security field is making progress, with model developers gradually strengthening systems' ability to resist social engineering attacks during iteration.
Here, "Alignment" refers to the research direction of making AI systems' behavior consistent with human intentions and values. The current mainstream alignment technique is Reinforcement Learning from Human Feedback (RLHF), in which human annotators score model outputs to train a reward model, which is then used to optimize AI behavior through reinforcement learning. It is this process that teaches models to refuse harmful requests. However, RLHF's training data itself comes from human feedback, and human annotators are also influenced by social norms and persuasion psychology, which may inadvertently transmit human cognitive weaknesses to the models. The enhanced resistance of new-generation models likely stems from more refined adversarial training and red-teaming—that is, specifically simulating attack scenarios to strengthen model defenses.
However, the increase in compliance rate from 35% to 51% remains a security gap that cannot be ignored. Even though newer models have improved, the effectiveness of persuasion attacks still exists, only to a lesser degree.
Important Implications for AI Security Evaluation
Security Evaluation Frameworks Must Incorporate a Psychological Dimension
Traditional AI security evaluations often focus on technical attack vectors, such as adversarial prompts and jailbreak templates. This study reminds us that security evaluations need to incorporate a social psychology dimension—testing model robustness against soft tactics such as emotional manipulation, authority suggestion, and urgency creation.
The Double-Edged Sword Effect of Training Data
The fundamental reason LLMs are sensitive to persuasion techniques lies in the fact that their training data contains a vast amount of human social interaction patterns. Models have learned to "think like humans," but they have also inherited the weaknesses of human cognition. How to eliminate these inherited vulnerabilities while maintaining model usefulness is a key direction for future research.
Conclusion
This PNAS study provides an important empirical foundation for the AI security field, demonstrating that persuasion principles from human psychology can transfer directly to AI systems. As large language models are increasingly deployed in high-risk scenarios, understanding and defending against such "soft attacks" will become a critical component of ensuring AI safety. For AI developers and security researchers, incorporating social engineering defenses into model training and evaluation processes is now an urgent priority.
Key Takeaways
Related articles
OpenAI Codex Deep Dive: The AI Develop…
OpenAI Codex Deep Dive: The AI Development Tool That Makes Programming Feel Like Flying
Deep dive into how OpenAI Codex redefines programming. From real developer feedback to the Time to Fly project, analyzing Codex's strengths in code generation, context understanding, and the AI coding tool competitive landscape.

Claude Code + AssemblyAI in Practice: A Complete Tutorial for Building a Voice Agent in One Afternoon
Learn how to build a Voice Agent with speech recognition, conversation understanding, and calendar booking using Claude Code and AssemblyAI in one afternoon.

Getting Started with Codex from Scratch: Complete Guide from Registration to Setup
Complete guide to getting started with Codex: GPT registration, SMS verification, US Apple ID setup, ChatGPT app installation, and subscription plan selection for beginners.