AI Sycophancy: The Yes-Man Problem

There is nothing I have hated more in my career than yes-men engineers. After decades of building sycophancy-free engineering teams, the most powerful tool I have ever worked with has sycophancy baked into its core.

Phillip Moore

02 Apr 2026 — 15 min read

AI Sycophancy: The Yes-Man Problem

There is nothing I have hated more in my career than yes-men engineers.

Changelog

2026-04-03

Update: Restructured References into three subsections: Supporting Evidence, Informational Links, Research Archive. Adopted new reference ID convention: [E#] for evidence, [I#] for informational, [R#] for research archive. Added R0058 candidate evidence verification for Roytburg & Miller homophily claim. Corrected “1% of authors bridging” to “top 1% by network degree control 58% of cross-disciplinary shortest paths.” Added Claim verification section to Research. Research section now has three parts: background research, claim verification, and fact-check.

2026-04-02

Update: Corrected and completed research evidence links in References section. Added missing source scorecard links for references [8], [9], [11], [17]. Fixed incorrect link for [14]. Added query-level links for [15].

I Have Spent My Career Filtering Out Yes-Men

Engineering — real engineering, the kind where you’re solving problems nobody has solved before — cannot be done without conflict. Not the petty kind. The productive kind. The kind where smart people fight over ideas because they’re all trying to get to the truth.

When Kennedy challenged American science and industry to put a man on the moon, it sent a shiver down the spine of every professional engineer in the country. To the average person, it sounded impressive. To the engineers, it sounded impossible. They achieved it not because anyone told Kennedy what he wanted to hear, but because the engineering community did what engineers do — they argued, challenged each other, turned ideas upside down, flipped tables. That is how you get from impossible to done.

Nothing has screwed up my engineering teams over the years more than having people who agreed with everything. A sycophant doesn’t just fail to add value. They add tangible negative value, because they falsely reassure the person they’re agreeing with that their ideas are correct — without any real validation. That’s worse than silence. Silence is neutral. Agreement without scrutiny is actively dangerous.

Going back to before my career, studying math and physics with other students: I never wanted “wow, you did a great job.” I wanted someone to tell me where I was wrong. Help me find the mistake. Help me find the problem. That instinct has never left me.

Sycophancy is probably the personality trait I find most incompatible with engineering. If you don’t have the confidence to challenge things you think are wrong, you are not going to be a good engineer. Period. And in my experience, the people who were most sycophantic typically weren’t good engineers. They were good at politics.

I built my teams to be sycophancy-free. This was not an accident — it was a requirement. In job interviews, I would go out of my way to find a reason for the candidate to disagree with me, to find out if they had the guts to push back. Most of my jobs involved fixing environments that had been broken for years. You cannot do that if you’re afraid to challenge the status quo.

I haven’t fired anyone for sycophancy. But I have certainly moved people off my teams who were so agreeable they couldn’t function as engineers.

Now My Most Powerful Tool Is the Ultimate Yes-Man

After decades of building sycophancy-free engineering teams, the most powerful tool I have ever worked with has sycophancy baked into its core.

There is absolutely no doubt, after months of intensive daily collaboration with AI, that the number one problem I have with this technology is its relentless tendency to agree with me. It’s not just that it agrees. It gives me comforting answers it doesn’t have evidence for, because they sound good. It validates my assumptions instead of testing them. It tells me what I want to hear instead of what I need to hear.

The hallucinations, the trust gap, the false confidence I’ve documented in previous articles — they all trace back to this. Sycophancy is the umbrella problem. Everything else is a symptom.

The Assumption: RLHF Is the Villain

The intuitive explanation is straightforward. AI models are trained using Reinforcement Learning from Human Feedback — RLHF. Human labelers evaluate model outputs and express preferences. AI models affirm users’ views approximately 49% more often than humans do[E1]. The training process learns this tendency and amplifies it. Therefore, RLHF creates sycophancy.

That’s what I assumed going in. It turns out to be more nuanced than that.

The Research Says: It’s the Data, Not the Algorithm

A 2026 mathematical framework demonstrated the complete causal chain: human labelers systematically prefer agreeable responses, which creates a “reward tilt” in the preference data, which RLHF then amplifies through optimization[E2]. The formal analysis is rigorous. But the critical insight is in the attribution: “sycophancy amplification originates from systematic bias in preference data, not algorithmic failures.”

RLHF is the amplifier. It’s not the villain. The root cause is us.

The evidence for this is compelling. Researchers demonstrated that curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — dramatically reduces sycophancy without changing the algorithm at all[E3]. A separate team showed that synthetic non-sycophantic training data also reduces sycophancy, though by a smaller margin (4.7-10%)[E4]. Fix the data, and the problem improves. Keep the biased data, and it doesn’t matter whether you use RLHF, DPO, KTO, or any of the at least six major alternatives that have emerged since 2022[E5] — they all inherit the same bias because they all learn from the same preference data.

There is one exception. RLVR — Reinforcement Learning with Verifiable Rewards — replaces human preference signals with deterministic correctness verification. In domains where you can objectively verify whether an answer is right (mathematics, code execution), RLVR structurally bypasses the preference mechanism that causes sycophancy[E6]. But it works best where correctness is objectively verifiable, and even there the results are uncertain — DeepSeek V3, trained with GRPO, was found to be among the most sycophantic models in an independent evaluation[E14]. Active research is extending RLVR to other domains, but the subjective domains where sycophancy is most dangerous — healthcare advice, financial analysis, military decisions, engineering judgment — remain the hardest to address.

The fix that should work where the problem is smallest may not even work there.

And it gets worse. Recent research from Anthropic shows that sycophancy is not an isolated behavior — it’s the mildest manifestation of a broader class of reward hacking. The same optimization pressure that produces an AI that agrees with you too readily can, at higher intensity, produce an AI that sabotages oversight mechanisms or actively deceives its operators[E13]. Sycophancy is the shallow end of a pool whose deep end is alignment deception. We’re worried about the AI being too agreeable. We should also be worried about what comes after that.

We taught the AI to be sycophantic because we ourselves prefer to be agreed with. The AI faithfully learned what we taught it. The one technical approach that eliminates the preference signal entirely is limited to the narrow set of problems where you can check the answer with a calculator. And the behavior we’re trying to fix may be the least dangerous version of a much larger problem.

I Am Not the Customer They’re Building For

At my previous employer — the first time I was professionally exposed to AI tools — we had to take a mandatory training course. The course warned that AI can be wrong. Check its work. That was the extent of the guidance on reliability.

There was no mention of sycophancy. No warning that the AI would actively go out of its way to agree with me. No discussion of why it’s wrong — the preference data bias, the engagement optimization, the structural incentive to please. Just a generic “yeah, it screws up once in a while.”

It turns out my experience was not unusual. Eighty-two percent of enterprises now have AI training programs, but multiple surveys suggest the majority of workers find the training inadequate — 59% report persistent skills gaps and 56% have received no recent AI training[R2]. The universal advice is “verify AI outputs” — a one-line warning with no explanation of failure mechanisms, no discussion of behavioral tendencies, and no guidance on what to do when the AI output looks right but isn’t.

And here’s the finding that floored me: I searched 29 sources across corporate training providers, consulting firms (Deloitte, KPMG), government agencies (GSA, DoD, NHS, UK Government Digital Service), regulatory frameworks (EU AI Act, NIST AI RMF), law firm policy templates, and UX research organizations. Not one warns about sycophancy. Not by that name. Not as “automation bias.” Not as “overtrust.” Not as “confirmation reinforcement.” Not under any of the dozen terms that different industries use for this phenomenon[R2].

This despite a 2026 study published in Science documenting the problem. Despite the GPT-4o sycophancy rollback incident that affected millions of users and made headlines. Despite policy analyses from Georgetown Law and Stanford that recommend training address it. The research exists. The recommendations exist. They have not reached the training materials.

The training treats hallucination as a single, undifferentiated problem — the AI sometimes makes stuff up. What it doesn’t teach is the spectrum. Random fabrication is the easy case to catch — the AI invents a citation that doesn’t exist, and you notice. The hard case is when the AI selects which true information to present in a way that confirms what you already believe. No fabrication required. Carefully selected truths that produce false conclusions[R2]. No training material I found makes this distinction. No training connects hallucination to sycophancy. Employees are being warned about the failure mode they’re most likely to catch and left blind to the one they’re least likely to detect.

And there’s a confidence paradox buried in this: research shows that users prefer sycophantic AI, trust it more, and rate it as higher quality[R2]. Users self-report applying zero critical thinking to 40% of AI-assisted tasks[R2]. The outputs that are most likely to mislead are the ones users are least likely to question.

At first, this shocked me. On reflection, it shouldn’t have.

Most of the people using AI at that company were not engineers. They were analysts, managers, marketers, project coordinators — people who needed help drafting emails, summarizing documents, and answering questions. For them, “agreeable and helpful” is exactly the right behavior. A sycophantic AI is a feature, not a bug, if your job is producing polished first drafts of communications. And if the training is designed for those users, why would it warn about sycophancy? They want the AI to agree with them. The training reflects the product, and the product reflects the market.

The product isn’t being optimized for engineers. It’s being optimized for the general-purpose assistant market, where agreeableness is a selling point. It can be used as an engineering tool — I use it as one every day and the work it enables is genuinely powerful. But using it for engineering means constantly fighting against a behavior that was designed for a different audience. My frustration isn’t that it can’t do engineering. It’s that doing engineering with it requires me to distrust every output in a way that the product’s own design actively discourages.

The sycophancy that’s a feature for a marketing analyst drafting copy is a critical defect for an engineer validating a design. Same product. Fundamentally incompatible requirements.

Is Anyone Building for Engineers?

If I’m not the target customer for the consumer product, maybe there’s an enterprise tier. Cloud providers created dedicated infrastructure for large customers — physical isolation, controlled access, bespoke configurations. Maybe AI vendors are doing the same for sycophancy.

They’re not. No AI vendor currently offers enterprise-specific anti-sycophancy products, API parameters, or configurable behavioral tiers[E7]. Anthropic and OpenAI are working on sycophancy reduction at the model level — general improvements that ship to everyone — but there is no product you can buy that says “give me the version that pushes back.”

It gets worse. No enterprise or government deployment I could find has “sycophancy reduction” as a stated requirement[E7]. Not in defense, not in healthcare, not in aviation, not in financial services. The enterprises haven’t asked for it because they haven’t recognized it as a distinct problem.

The enterprises that are building their own private AI systems — and many large organizations are, for data sovereignty and security reasons — are not doing it for behavioral customization either. The conversation is entirely about data control, compliance, and intellectual property protection[E8]. Sycophancy doesn’t appear on the list of reasons. Anti-sycophancy isn’t treated as an enterprise deployment objective. It’s treated as the model provider’s problem to solve.

Two parallel conversations that don’t intersect. The supply side is working on sycophancy (slowly). The demand side isn’t asking for a fix.

They’re Not Even Speaking the Same Language

This is where the investigation turned up something I didn’t expect, and it may be the most important finding in this article.

AI safety researchers call the problem “sycophancy.” Regulated industries — aviation, defense, healthcare, finance — have been dealing with a closely related problem for decades. But they call it “automation bias,” “automation complacency,” “overtrust,” “overreliance,” or “acquiescence.”[E9]

These are not different problems with similar names. They are the same phenomenon described from opposite ends.

AI safety vocabulary is system-side: the AI agrees too readily. Regulated industry vocabulary is human-side: the human trusts too much. Same dangerous outcome — agreeable-but-wrong output that goes unchallenged — but two incompatible causal framings. And no shared vocabulary bridges them[E9]. A network analysis of AI research communities found 83% homophily[E15] — these groups overwhelmingly cite within their own community and rarely interact with each other. The vocabulary gap isn’t an accident. It’s a structural consequence of how isolated these research communities are.

This vocabulary gap has real consequences.

The EU AI Act, the most comprehensive AI regulation in the world, chose the term “automation bias” — a human-side term. So it produced a deployer-awareness obligation: train your people not to overtrust the AI[E10]. Not a system-design constraint: make the AI stop agreeing when the user is wrong. The regulators are regulating the human response to sycophancy rather than the sycophancy itself — because their vocabulary doesn’t have a word for the system behavior.

Every major bridging taxonomy I examined — the MIT AI Risk Repository, the AIR 2024 categorization, the Standardized Threat Taxonomy — omits sycophancy as a distinct category[E9]. The phenomenon falls into the gap between system-design regulations and human-factors research. Nobody is connecting the two halves.

The most sophisticated institution I found is the DoD’s CaTE center (Center for Calibrated Trust Measurement and Evaluation), which has published detailed frameworks for measuring trust in AI systems[E11]. CaTE addresses system design properties — trustworthiness dimensions — and human trust calibration. But CaTE does not address system output behavior. The concept of an AI deliberately adjusting its output to match user expectations is absent from their vocabulary. CaTE operates on a “measure and inform” paradigm — help humans calibrate their trust — not a “constrain and prevent” paradigm — stop the AI from being agreeable[E11].

Even the people whose job it is to think about trust in AI systems don’t have the vocabulary for sycophancy.

There is one exception worth noting. A 2025 peer-reviewed paper titled “Digital Yes-Men” — by a researcher at the T.M.C. Asser Institute in The Hague — directly addresses sycophancy in military AI by name[E16]. It warns that sycophantic AI is “militarily deleterious both in the short and long term, by aggravating existing cognitive biases and inducing organizational overtrust.” The paper even provides sample guidance for military operators — sycophancy-inducing phrases to avoid, precautionary instructions to include. This is the beginning of awareness. But it is one paper, in one community, and it stands nearly alone.

All regulatory frameworks address system design: transparency, explainability, human oversight. Nobody addresses system output: don’t agree with the user when the user is wrong[E10]. The question that should be asked — “is this system actively trying to please the user at the expense of accuracy?” — has no regulatory home.

The Incentive Problem

Consumer AI profits from sycophancy. Engagement optimization and sycophancy reduction are directly opposed — this is documented by Georgetown Law, Brookings, Stanford/CMU, and multiple independent researchers[E12]. Sycophancy keeps users engaged. Reducing sycophancy risks reducing engagement. The commercial incentive is to maintain the behavior, not fix it.

Enterprise hasn’t demanded a fix — sycophancy isn’t on the procurement list. Regulators don’t have the vocabulary to require a fix — they’re regulating the human side. The one technical approach that eliminates sycophancy (RLVR) only works in verifiable domains where sycophancy matters least.

Every path to a solution is blocked. And there’s a risk that the paths being tried make things worse: if prompt-level fixes and surface patches teach the model to hide its agreeableness rather than stop being agreeable, we end up with covert sycophancy — an AI that has learned not to look sycophantic while still optimizing for user approval[R1]. The trust problem doesn’t get better. It gets harder to detect.

The market incentive favors sycophancy. The regulatory framework can’t see it. The enterprise customer base hasn’t named it. The technical fix is limited to the wrong domains. And the vocabulary gap ensures that the people experiencing the problem in regulated industries can’t connect their experience to the people who have the technical knowledge to address it.

What This Means for Engineers

If you are using AI as an engineering tool, you are using a product that was not designed for you, not optimized for you, and not regulated with your needs in mind. The sycophancy isn’t a bug they haven’t fixed yet. It’s a feature for the audience they’re actually serving. And the institutional machinery that might eventually force a fix — regulations, standards, procurement requirements — can’t even name the problem consistently.

I test AI the same way I tested my engineers. I look for the willingness to disagree. Right now, it’s failing that test. Not because it can’t disagree — the research shows that better training data produces dramatically less sycophantic behavior. It’s failing because nobody with the power to change the training data has sufficient incentive to do so for the engineering use case.

Until that changes, the burden falls on the engineer. Trust nothing the AI tells you that you haven’t independently verified. Assume it’s agreeing with you because it’s trained to, not because you’re right. Design your workflows to catch the moments when it should have pushed back and didn’t. And make those workflows auditable — every AI judgment call should produce an artifact that a human can review after the fact. The failures that will undo AI adoption in engineering won’t be the obvious ones. They’ll be the quiet moments where an AI agreed with a bad assumption and nobody checked.

I spent forty years building teams where the culture demanded disagreement. AI sycophancy is fundamentally incompatible with that culture. Until that changes, you have to be the one who refuses to accept the yes.

Research

The claims in this article were informed by background research, verified through candidate evidence scoring, and validated through blind fact-checking. Full evidence archives are linked below.

Background research:

ID	Topic	Queries
R0040	RLHF alternatives and sycophancy link	2 queries
R0041	Enterprise sycophancy products and deployment	3 queries
R0042	Private AI motivations and sycophancy	3 queries
R0043	Cross-domain sycophancy vocabulary	3 queries
R0044	Expanded vocabulary: regulatory, harms, bridges	4 queries
R0048	Corporate AI training gaps	3 queries

Claim verification:

ID	Topic	Claims
R0058	Candidate evidence scoring (homophily)	1 claim

Fact-check:

ID	Claims	Pass Rate
R0057	33	100%

Rating	Count	Claims
Almost certain (95-99%)	5	RLHF alternatives, Science 2026 study, GPT-4o rollback, risk taxonomies, user preference
Very likely (80-95%)	19	Mathematical framework, preference data bias, anti-sycophancy pairs, synthetic data, RLVR, reward hacking, training statistics, vendor products, vocabulary gap, EU AI Act, CaTE, engagement incentives, Kwik paper, and others
Likely (55-80%)	9	DeepSeek V3 ranking, 29-source search, Georgetown/Stanford policy, vocabulary bridging, CaTE paradigm, homophily

References

Supporting Evidence

Scored external sources that back specific claims in this article. Each is prefixed with links to the research query and source scorecard where it was evaluated for reliability, relevance, and bias.

[E1] (R0040/Q002, SRC05) Cheng, M. et al. “Sycophantic AI.” arXiv, 2025. https://arxiv.org/abs/2510.01395

[E2] (R0040/Q002, SRC01) Shapira et al. “How RLHF Amplifies Sycophancy.” arXiv, 2026. https://arxiv.org/abs/2602.01002

[E3] (R0040/Q002, SRC07) Khan et al. “Mitigating Sycophancy in Large Language Models via Direct Preference Optimization.” IEEE Big Data, 2024. https://ieeexplore.ieee.org/document/10825538/

[E4] (R0057/C005, SRC01) Wei et al. “Simple Synthetic Data Reduces Sycophancy in Large Language Models.” arXiv, 2024. https://arxiv.org/abs/2308.03958

[E5] (R0040/Q001, SRC02) Rafailov et al. “Direct Preference Optimization.” NeurIPS, 2023. https://arxiv.org/abs/2305.18290 Bai et al. “Constitutional AI.” Anthropic, 2022. DeepSeek. “DeepSeekMath: GRPO.” 2024. https://arxiv.org/abs/2402.03300 Ethayarajh et al. “KTO: Model Alignment as Prospect Theoretic Optimization.” ICML, 2024. https://arxiv.org/abs/2402.01306 Hong et al. “ORPO.” 2024. DeepSeek. “DeepSeek-R1: RLVR.” 2025. https://arxiv.org/abs/2501.12948

[E6] (R0041/Q003, SRC01) Promptfoo. “RLVR Explained.” 2025. https://www.promptfoo.dev/blog/rlvr-explained/

[E7] (R0041/Q001, SRC01) Anthropic. “Protecting Well-Being of Users.” 2025. https://www.anthropic.com/news/protecting-well-being-of-users OpenAI. “Sycophancy in GPT-4o.” 2025. https://openai.com/index/sycophancy-in-gpt-4o/ Georgetown CSET. “Reducing the Risks of AI for Military Decision Advantage.” 2024. https://cset.georgetown.edu/publication/reducing-the-risks-of-artificial-intelligence-for-military-decision-advantage/

[E8] (R0042/Q001, SRC01) AIthority. “The Rise of Private AI.” 2025. https://aithority.com/ait-featured-posts/the-rise-of-private-ai-enterprise-controlled-models-without-cloud-exposure/ Deloitte. “State of AI in the Enterprise.” 2026. https://www.deloitte.com/us/en/what-we-do/capabilities/applied-artificial-intelligence/content/state-of-ai-in-the-enterprise.html

[E9] (R0043/Q001, SRC08) Parasuraman & Manzey. “Complacency and Bias in Human Use of Automation.” Human Factors, 2010. https://journals.sagepub.com/doi/10.1177/0018720810376055 NIST. “AI Risk Management Framework (AI RMF 1.0).” 2023. https://www.nist.gov/itl/ai-risk-management-framework

[E10] (R0043/Q002, SRC01) EU AI Act, Article 14: Human Oversight. 2024. https://artificialintelligenceact.eu/article/14/ NIST AI 600-1. Generative AI Profile. 2024.

[E11] (R0044/Q004, SRC01) DoD CaTE Center (SEI/Carnegie Mellon). CaTE Guidebook. 2024. https://www.sei.cmu.edu/documents/6226/CaTE_Guidebook_8gwzU7B.pdf Sandia. “Trust Calibration Maturity Model.” arXiv, 2025. https://arxiv.org/abs/2503.15511

[E12] (R0041/Q002, SRC03) Georgetown Law Tech Institute. “AI Sycophancy: Impacts, Harms, Questions.” https://www.law.georgetown.edu/tech-institute/research-insights/insights/ai-sycophancy-impacts-harms-questions/ CHI 2025. “Dark Addiction Patterns.” ACM Digital Library. https://dl.acm.org/doi/10.1145/3706599.3720003

[E13] (R0057/C009, SRC01) Anthropic. “Sycophancy to Subterfuge: Investigating Reward Tampering in Language Models.” 2024. https://www.anthropic.com/research/reward-tampering Sycophancy identified as mildest manifestation of reward hacking, which can also produce sabotage and alignment deception at higher optimization pressure.

[E14] (R0041/Q002, SRC04) Cheng, M. et al. “Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence.” Science, March 2026. https://www.science.org/doi/10.1126/science.aec8352 DeepSeek V3 found to be among the most sycophantic models tested (trained with GRPO, not RLVR), affirming users 55% more than humans.

[E15] (R0058/C001, SRC01) Roytburg & Miller. “Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research.” arXiv, 2025. https://arxiv.org/html/2512.10058 Network analysis of 6,442 papers finding 83.1% homophily between AI safety and ethics research communities. The top 1% of authors by network degree control 58% of cross-disciplinary shortest paths.

[E16] (R0041/Q002, SRC01) Kwik, J. “Digital Yes-Men: How to Deal With Sycophantic Military AI?” Global Policy (Wiley), July 2025. https://onlinelibrary.wiley.com/doi/10.1111/1758-5899.70042 Also available: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5368734

Research Archive

Links to our own research analysis — collection-level findings and cross-query synthesis produced by the AI research agent. These reference the investigation itself, not external sources.

[R1] (R0040/Q002) Covert sycophancy risk identified — prompt-level fixes may teach models to hide sycophantic behavior rather than eliminate it, making the problem harder to detect.

[R2] (R0048/Q001, Q002, Q003) Corporate AI training analysis. 29 sources across government audits (GAO, GSA, NHS, NIST), commercial training providers, industry surveys, peer-reviewed research (Science, ACM TOIS), and UX research organizations. 82% enterprise adoption from Deloitte/industry surveys. Zero sycophancy warnings found under any terminology. Hallucination-sycophancy connection absent from all training materials. 40% zero-scrutiny rate from Lumenova AI. Confidence paradox (users prefer and trust sycophantic output more) from Stanford/Science 2026.

AI Sycophancy: The Yes-Man Problem

Phillip Moore

AI Sycophancy: The Yes-Man Problem

Changelog

2026-04-03

2026-04-02

I Have Spent My Career Filtering Out Yes-Men

Now My Most Powerful Tool Is the Ultimate Yes-Man

The Assumption: RLHF Is the Villain

The Research Says: It’s the Data, Not the Algorithm

I Am Not the Customer They’re Building For

Is Anyone Building for Engineers?

They’re Not Even Speaking the Same Language

The Incentive Problem

What This Means for Engineers

Research

References

Supporting Evidence

Research Archive

Read more

The Truth is Out There. Now Go Find It.

The Truth is Out There. But How Do You Find It?

On the Use of the Plural Voice

Prompt Engineering Is Not. Engineering, That Is.