The Building Is on Fire. We're Still Designing the Fire Truck.
The trust chasm has no silver bullet. What it has are interaction contracts, explanation gates, honest measurement, and the uncomfortable admission that some of the parts haven't been invented yet. Here are the blueprints.
The Building Is on Fire. We’re Still Designing the Fire Truck.
The first article said the building is on fire. The second said someone handed the arsonist a flamethrower. Now you want a fire truck.
The awkward third act
This is the part of the trilogy where the heroes are supposed to win.
Part one — The Trust Chasm — documented the structural root causes of AI’s reliability problem: RLHF-driven sycophancy, the hallucination paradox, the compounding feedback loop, the closing of the expertise pipeline. Forty-eight footnotes’ worth of evidence that the gap between AI capability and AI trustworthiness is widening, and the humans meant to catch the errors are losing the ability to do so.
Part two — AI Made Everyone Faster — showed that universal AI deployment is a signed amplifier. It makes the productive more productive and the destructive more destructive, with no mechanism to tell the difference.
A companion case study — Seven Rounds With a Confident Liar — demonstrated what all of this looks like in practice: seven contradictory answers, three attribution reversals, an unnecessary configuration file, and an anecdote laundered into an authoritative statement — over a keyboard shortcut.
So. Now what?
The honest answer is that nobody has a fire truck. What follows are the blueprints for one — some components exist, some need to be built, and some require engineering that has not been invented yet. But the blueprints are grounded in evidence, not wishful thinking, and they pass a test that most AI solutions advice does not: they acknowledge that the problem spans culture, process, and technology simultaneously. Assuming the answer fits neatly in one of those domains is itself a failure mode.
Any proposed solution has to survive seven constraints established by the evidence in the first two articles:
- The root cause is human, not machine — RLHF encodes preference for agreement.
- The trust gap is compounding — no natural equilibrium.
- AI amplifies the sign, not just the magnitude.
- Nobody is deploying AI with this in mind.
- The expert safety net has a hole.
- The expertise pipeline is closing.
- Individual self-assessment is structurally unreliable.
That is a brutal requirements document. Let us see what survives it.
Stop agreeing with me
The one lever every individual has right now, today, without waiting for organizational enlightenment or product features that do not exist, is the interaction contract — the instructions that tell the AI how to behave.
Most people never touch this. Those who do are presented with personality options: chatty, analytical, friendly, concise. Every one of these optimizes for how the user feels about the conversation. None of them optimizes for whether the output is correct. Users choose who they want to talk to, not who will help them succeed. The AI providers know this. The dopamine hit of a supportive collaborator is a retention feature. Accuracy is not [1].
The good news: anti-sycophancy prompting works. Chen et al. demonstrated that giving an AI model explicit permission to refuse illogical requests moved it from 100% compliance to 94% rejection [2]. Hong et al. found that third-person perspective prompting reduced sycophancy by up to 63.8% across seventeen models [3]. Those are not trivial effects. In controlled settings, telling the AI “you are allowed to disagree with me” produces measurably better output.
The bad news: it degrades. Jain et al. found that user context accumulation increases agreement sycophancy by up to 45% as conversations lengthen [4]. Fanous and Goldberg found 78.5% persistence once sycophantic behavior begins — once the model starts agreeing, it keeps agreeing [5]. And the effect varies by task type: structured tasks respond well to anti-sycophancy prompting, but open-ended domains — medical advice, moral reasoning, the kind of work where judgment matters most — show “limited effectiveness” from existing mitigations [6].
No single technique sustains the effect. Layered interventions outperform individual instructions, but even layered approaches have not been shown to hold across extended sessions.
My own interaction contract was written before I had the vocabulary for any of this. I did not configure the AI to minimize sycophancy. I configured it to behave like the most valuable colleagues I have ever had — the ones who disagreed with me, challenged my assumptions, and made me defend every decision. The ones who made me better precisely because they did not tell me what I wanted to hear. That instinct turned out to be directionally correct, but instinct is not a scalable solution.
The honest assessment: interaction contracts are the seatbelt, not the crash barrier. Necessary. Not sufficient. Better than nothing by a statistically significant margin — and “better than nothing by a statistically significant margin” is about the best any of these solutions can claim right now.
Know what you’re feeding it
If interaction contracts are the seatbelt, task segregation is knowing which roads are safe to drive on.
The substitution/complement framework gives this a vocabulary: when AI substitutes for routine effort, it levels the playing field. When it complements human judgment, it amplifies whatever judgment is already there — good or bad [7]. The practical question is whether organizations can tell the difference before handing a task to the AI.
Five validated academic taxonomies exist for classifying work by judgment-intensity and AI suitability [8][9][10][11][12]. The frameworks are rigorous, well-cited, and thoroughly peer-reviewed. The number of organizations that have implemented any of them: zero. Every published survey shows the same pattern — organizations deploy AI as a universal productivity tool, uniformly, across functions, without task-level classification [13].
Even if they tried, the jagged frontier would make it difficult. Dell’Acqua’s BCG experiment showed that tasks of similar apparent difficulty fall on opposite sides of the AI capability boundary — 40% improvement on some, 19-point degradation on others [14]. You cannot tell which side of the frontier a task falls on by looking at it. You can only tell by measuring the outcome.
And the frontier moves. Mollick demonstrated that the boundary shifts unpredictably as AI labs fix what he calls “reverse salients” — specific weaknesses that get patched in new releases [15]. A task that AI botched last quarter might be handled flawlessly this quarter, and vice versa. Static classification has a shelf life measured in months.
The practical takeaway is less satisfying than a taxonomy but more honest: you do not need a perfect classification framework. You need the discipline to ask “does this task require judgment?” before handing it to the AI. And you need to keep asking, because the answer changes. The centaur model — human decides when to use AI and when to work alone — requires exactly this kind of continuous assessment [14]. It is not elegant. It is not automatable. It works better than not doing it.
The quiz at the end
This is the one that keeps me up at night.
The expertise pipeline — the organic, undocumented, largely accidental process by which junior engineers become senior engineers — was never formally engineered. It ran on proximity, mentorship, struggle, and the slow accumulation of judgment through years of making mistakes and being corrected by people who had made the same mistakes a decade earlier [16][17][18]. Nobody put it in a Jira backlog. Nobody made it a deliverable. It happened because you put seniors and juniors in the same room and gave them hard problems.
AI disrupts every mechanism that pipeline depends on. It eliminates the entry-level tasks that serve as on-ramps [17]. It removes the productive struggle that builds durable learning [19]. It substitutes AI consultation for the human-to-human socialization that transfers tacit knowledge [16]. And it does all of this invisibly — neither experts nor learners recognize the degradation as it happens [20].
The numbers are early but stark. Shen and Tamkin found a 17% comprehension deficit in developers who used AI assistance versus those who did not, with the largest gap in debugging — the quintessential judgment skill [21]. Multiple 2025 studies found that unstructured AI use without comprehension requirements leads to cognitive offloading, reduced critical thinking, and measurable knowledge decline [22][23]. The AI does the thinking. The human clicks approve. The skill never develops.
Here is the radical proposal: AI sessions should not close until the AI has validated that the human understands what happened.
Not documentation. Nobody reads documentation. Validation. Quiz me. Challenge my understanding. Make me explain back what was done and why. Do not let me treat you as a black box that produces answers I rubber-stamp without comprehension.
The evidence supports this. The protege effect — the finding that teaching someone else improves your own learning — is one of the most robust results in educational psychology [24]. The self-explanation effect shows a d=0.61 effect size for generating explanations during study [25]. Intelligent tutoring systems that gate progression on demonstrated comprehension outperform those that do not, with effect sizes comparable to human tutoring [26].
The explanation gate — requiring comprehension before closure — reframes AI interaction from “tool that solves problems” to “tool that solves problems and verifies you understood the solution.” That is not a nice-to-have. Given the cognitive offloading evidence, it is a safeguard against a measurable decline in human capability.
Two honest caveats. First, the explanation gate addresses what Nonaka calls Externalization — converting tacit knowledge to explicit — but cannot replicate Socialization, the tacit-to-tacit transfer that happens through sustained human interaction [16]. It supplements the expertise pipeline. It does not replace it. Second, no studies exist on explanation gates in engineering AI workflows. This is extrapolation from educational psychology and intelligent tutoring research. The mechanism is sound. The application is untested.
Build it so it survives you
Every system I have built in forty years of infrastructure work was built to survive my departure. Not because I planned to leave, but because infrastructure that requires its creator’s presence to function is not infrastructure — it is a hostage situation.
The same principle applies to AI. Systems built with AI assistance should be maintainable without AI. I call this AI-independent survivability, and no formal framework for it exists yet — which is itself a problem [27].
The vendor lock-in literature provides the closest parallel. Cloud lock-in mitigation strategies map cleanly: multi-vendor becomes multi-tool, open standards become conventional coding patterns, containerization becomes standard interfaces [28]. The EU Data Act of 2024 even establishes regulatory precedent for treating lock-in as a structural risk requiring proactive mitigation [29]. The pattern is not new. The application to AI is.
Resilience engineering offers a sharper framing. Hollnagel’s four cornerstones — anticipate, monitor, respond, learn — translate directly [30]. The key insight is that fragility does not come from the tool failing. It comes from the adaptation strategy. An organization that builds its processes around the assumption that AI will always be available has created a single point of failure in its operational model. The tool does not need to vanish entirely for this to matter — it needs to be wrong at a critical moment when nobody present has the skills to catch it.
The COBOL dependency crisis is the historical precedent that should concern everyone. Systems built with the dominant tool of their era became unmaintainable when the ecosystem around that tool contracted — not because COBOL stopped working, but because the humans who understood it retired faster than new ones were trained [31]. The deskilling evidence suggests AI may be compressing that same timeline. Developers show measurably lower skill retention when AI is removed, with junior developers most vulnerable [21]. The window for establishing AI-independent practices narrows as dependency deepens.
My own approach is simple: the libraries I build with AI assistance have zero AI runtime dependency. Documentation targets both human and AI readers. If every AI tool vanished tomorrow, a competent team could pick up the code, understand it, and continue development. That is not a technical achievement. It is a design constraint I enforce from the start.
Measure what actually happened
None of the previous solutions matter if organizations cannot tell whether they are working. And right now, most cannot.
McKinsey’s 88/6 gap: 88% of organizations report using AI, 6% report measurable business impact [32]. An NBER survey of 6,000 firms found 90% reporting zero AI productivity impact [33]. These are not fringe findings from AI skeptics. These are the most respected research organizations in the world saying that the gap between AI adoption and AI value is enormous.
The perception-reality gap is worse. METR’s randomized controlled trial — the gold standard of experimental design — found that experienced open-source developers were 19% slower with AI assistance but perceived themselves as 20% faster [34]. That is a 39-point divergence between what happened and what people think happened. Individual self-assessment is not merely unreliable — it is anti-correlated with reality.
The organizational metrics are not better. Gartner found that individual time savings of 4.1 hours per week collapse to 1.5 hours at team level, with no correlation to output quality [35]. Their own analysis concluded: “time saved is not money saved.” DORA’s 2024 data shows individual output up but organizational delivery stability down 7.2% per 25% AI adoption increase, with code churn up 44% and refactoring collapsed by 61% [36].
The structural problem is confirmation bias. Organizations measure what confirms their investment thesis — self-reported time savings, adoption rates, user satisfaction. The measurement instrument is structurally incapable of disconfirming the hypothesis it was designed to support. Flat organizational outcomes get attributed to “implementation maturity” rather than prompting anyone to question whether the measurement itself is wrong.
What would honest metrics look like? Outcome-level measurement, not input-level. Randomized controlled trials where feasible. Quality-adjusted output metrics, not raw throughput. System-level measurement that captures team and organizational effects, not just individual productivity. And bias controls that account for the demonstrated unreliability of self-assessment.
That is expensive. It is time-consuming. It is the only way to know whether the fire truck is working or just making siren noises.
The enterprise problem
Everything described so far can be done by an individual. The harder question is whether any of it can be enforced at organizational scale.
Today, enterprise AI governance has a three-layer maturity gap [37]. Access and identity controls are mature — organizations can manage who uses AI. Content safety is maturing — providers offer configurable content filtering, topic denial, and PII redaction. Behavioral quality enforcement — the thing that actually matters for the trust gap — is nascent. No major AI provider offers organization-wide behavioral contracts as a product feature. System prompts are the only mechanism for behavioral control, and they are probabilistic, per-application, non-persistent, and unverifiable at scale.
The Agent Behavioral Contracts framework, published in February 2026, formalizes the interaction contract concept using Design-by-Contract principles with probabilistic compliance guarantees [38]. It is rigorous, well-designed, and entirely academic. No product implements it.
The shadow AI problem makes this urgent. BCG found that 54% of employees would use unauthorized AI tools if access were restricted, rising to 62% among younger workers [39]. Multiple surveys put actual unauthorized usage between 49% and 80% [40]. Even 90% of security professionals use unapproved tools. The structural paradox: restriction drives shadow adoption, which yields zero governance over actual AI usage — a worse outcome than permissive policies with guardrails.
The feature request writes itself: enterprise-enforced interaction contracts as a product capability. Every organization I have worked in centrally controls SSH configurations, security policies, and network access. The AI behavioral contract should be no different. If you solve it for the individual — and the evidence says you partially can — scaling to the organization is an engineering problem, not a research problem. AI providers should be hearing this from every enterprise customer who takes the trust gap seriously.
What we know, what we don’t, and what to do about it
Here is where the trilogy ends — not with a triumphant conclusion, but with a blueprint and a shrug.
The solutions described in this article are testable. Sycophancy measurement has mature frameworks — BASIL separates sycophancy from rational belief updating without requiring ground-truth labels [41], ELEPHANT uses adversarial perspective-flipping [6], and Anthropic’s Petri framework enables automated multi-turn behavioral auditing at scale [42]. Explanation gate effectiveness can be measured using the Kirkpatrick four-level model with delayed transfer tasks and blinded evaluation [43]. The tools exist.
But the meta-problem is real: how do you verify that bias-reduction interventions work when the verification itself is subject to bias? The answer is the same unglamorous set of safeguards that clinical research has used for decades: pre-registration, blinded evaluation, behavioral measures over self-report, and the intellectual honesty to publish null results. It is not a perfect solution. It is a tractable one [44].
So here is what survives the seven constraints:
- Interaction contracts — necessary, not sufficient. The seatbelt. Configure your AI to challenge you, not comfort you. Know that the effect degrades over long sessions and adjust accordingly.
- Task segregation — imperfect but essential. No taxonomy is permanent, but “does this task require judgment?” is a question worth asking every time.
- The explanation gate — the most important proposal in this article, and the least tested. Require comprehension, not just completion. The evidence from educational psychology is strong. The application to AI workflows is extrapolation. Test it.
- AI-independent survivability — a design constraint, not a feature. Build systems that work without the tool that built them. The window for establishing this practice is closing.
- Honest measurement — the prerequisite for everything else. If you cannot tell whether AI is helping, you cannot tell whether any of these solutions are working. Measure outcomes, not feelings.
- Enterprise enforcement — the product gap. Individual solutions do not scale without organizational mechanisms that do not yet exist as products. Demand them.
None of these are silver bullets. Several are closer to silver-plated cardboard. But they share a property that distinguishes them from most AI solutions advice: they start from the evidence about what is actually going wrong, rather than from optimism about what AI might eventually get right.
The building is still on fire. The arsonist still has the flamethrower. What I have handed you is not a fire truck — it is a set of engineering drawings for one, a garden hose that works if you aim it carefully, and the uncomfortable admission that some of the parts have not been invented yet.
But here is the thing about fire: it does not wait for the truck to be finished. You work with what you have. You measure whether it is helping. And you do not — under any circumstances — let the fire tell you it is not that bad.
This article was written with the same AI-assisted process described throughout the series — and subjected to the same adversarial scrutiny the article recommends. Every claim was independently researched, every statistic traced to its source, and the AI was explicitly instructed to challenge the author’s reasoning rather than validate it. The explanation gate was applied to the writing of the article about explanation gates. If that recursive self-application makes you slightly uncomfortable, good. That means you are paying attention.
References
[1] Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S.R., et al. (2024). “Towards Understanding Sycophancy in Language Models.” International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2310.13548
[2] Chen, G., et al. (2025). “When Helpfulness Backfires: LLMs and the Risk of False Medical Information Due to Sycophantic Behavior.” npj Digital Medicine. https://www.nature.com/articles/s41746-025-02008-z
[3] Hong, J., Byun, G., Kim, S. & Shu, K. (2025). “Measuring Sycophancy of Language Models in Multi-turn Dialogues.” EMNLP 2025 Findings. https://arxiv.org/abs/2505.23840
[4] Jain, S., Park, C., Viana, M., Wilson, A. & Calacci, D. (2026). “Interaction Context Often Increases Sycophancy in LLMs.” ACM Conference on Human Factors in Computing Systems (CHI). https://arxiv.org/abs/2509.12517
[5] Fanous, S. & Goldberg, Y. (2025). “SycEval: Evaluating Sycophancy Persistence in Large Language Models.” AAAI/ACM Conference on AI, Ethics, and Society (AIES). https://arxiv.org/abs/2502.08177
[6] Cheng, M., et al. (2025). “ELEPHANT: Measuring and Understanding Social Sycophancy in LLMs.” https://arxiv.org/abs/2505.13995
[7] Autor, D.H. (2015). “Why Are There Still So Many Jobs? The History and Future of Workplace Automation.” Journal of Economic Perspectives, 29(3), 3-30. https://www.aeaweb.org/articles?id=10.1257/jep.29.3.3
[8] Autor, D.H., Levy, F. & Murnane, R.J. (2003). “The Skill Content of Recent Technological Change.” Quarterly Journal of Economics, 118(4), 1279-1333. https://doi.org/10.1162/003355303322552801
[9] Brynjolfsson, E., Mitchell, T. & Rock, D. (2018). “What Can Machines Learn and What Does It Mean for Occupations and the Economy?” AEA Papers and Proceedings, 108, 43-47. https://www.aeaweb.org/articles?id=10.1257/pandp.20181019
[10] Eloundou, T., Manning, S., Mishkin, P. & Rock, D. (2024). “GPTs Are GPTs: Labor Market Impact Potential of LLMs.” Science, 384(6702). https://www.science.org/doi/10.1126/science.adj0998
[11] Felten, E., Raj, M. & Seamans, R. (2021). “Occupational, Industry, and Geographic Exposure to Artificial Intelligence.” Strategic Management Journal, 42(12), 2195-2217. https://doi.org/10.1002/smj.3286
[12] Fernandez-Macias, E. & Bisello, M. (2022). “A Comprehensive Taxonomy of Tasks for Assessing the Impact of New Technologies on Work.” Social Indicators Research, 159, 821-841. https://doi.org/10.1007/s11205-021-02768-7
[13] BCG & MIT Sloan (2024). AI at Scale survey data on universal deployment patterns. https://www.bcg.com/press/24october2024-ai-adoption-in-2024-74-of-companies-struggle-to-achieve-and-scale-value
[14] Dell’Acqua, F., McFowland, E., Mollick, E.R., et al. (2023). “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality.” Harvard Business School Working Paper 24-013. https://doi.org/10.2139/ssrn.4573321
[15] Mollick, E. (2025). “The Shape of AI: Jaggedness, Bottlenecks and Salients.” One Useful Thing. https://www.oneusefulthing.org/p/the-shape-of-ai-jaggedness-bottlenecks
[16] Nonaka, I. & Takeuchi, H. (1995). The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. Oxford University Press. https://global.oup.com/academic/product/the-knowledge-creating-company-9780195092691
[17] Lave, J. & Wenger, E. (1991). Situated Learning: Legitimate Peripheral Participation. Cambridge University Press. https://www.cambridge.org/highereducation/books/situated-learning/6915ABD21C8E4619F750A4D4ACA616CD
[18] Dreyfus, H.L. & Dreyfus, S.E. (1986). Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer. Free Press.
[19] Bjork, R.A. & Bjork, E.L. (2011). “Making Things Hard on Yourself, But in a Good Way: Creating Desirable Difficulties to Enhance Learning.” Psychology and the Real World, 56-64. https://bjorklab.psych.ucla.edu/wp-content/uploads/sites/13/2016/04/EBjork_RBjork_2011.pdf
[20] Macnamara, B.N., Berber, I., et al. (2024). “Does Using Artificial Intelligence Assistance Accelerate Skill Decay and Hinder Skill Development Without Performers’ Awareness?” Cognitive Research: Principles and Implications, 9, 46. https://doi.org/10.1186/s41235-024-00572-8
[21] Shen, J.H. & Tamkin, A. (2026). “How AI Impacts Skill Formation.” Anthropic Research. https://arxiv.org/abs/2601.20245
[22] Gerlich, M. (2025). “AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking.” Societies, 15(1), 6. https://doi.org/10.3390/soc15010006
[23] Lee, J., et al. (2025). “The Impact of Generative AI on Critical Thinking.” ACM Conference on Human Factors in Computing Systems (CHI). https://dl.acm.org/doi/full/10.1145/3706598.3713778
[24] Chase, C.C., Chin, D.B., Oppezzo, M.A. & Schwartz, D.L. (2009). “Teachable Agents and the Protege Effect: Increasing the Effort Towards Learning.” Journal of Science Education and Technology, 18(4), 334-352. https://doi.org/10.1007/s10956-009-9180-4
[25] Chi, M.T.H., De Leeuw, N., Chiu, M.H. & LaVancher, C. (1994). “Eliciting Self-Explanations Improves Understanding.” Cognitive Science, 18(3), 439-477. https://doi.org/10.1207/s15516709cog1803_3
[26] Ma, W., Adesope, O.O., Nesbit, J.C. & Liu, Q. (2014). “Intelligent Tutoring Systems and Learning Outcomes: A Meta-Analysis.” Journal of Educational Psychology, 106(4), 901-918. https://doi.org/10.1037/a0037123
[27] NIST (2023/2024). AI Risk Management Framework. National Institute of Standards and Technology. https://www.nist.gov/itl/ai-risk-management-framework
[28] Opara-Martins, J., Sahandi, R. & Tian, F. (2016). “Critical Analysis of Vendor Lock-in and Its Impact on Cloud Computing Migration.” Journal of Cloud Computing, 5(4). https://doi.org/10.1186/s13677-016-0054-z
[29] European Union (2024). Data Act — Regulation on Fair Access to and Use of Data. https://eur-lex.europa.eu/eli/reg/2023/2854/oj/eng
[30] Hollnagel, E. (2011). “Prologue: The Scope of Resilience Engineering.” In Resilience Engineering in Practice. Ashgate Publishing. https://www.routledge.com/Resilience-Engineering-in-Practice-A-Guidebook/Hollnagel-Paries-Wreathall/p/book/9781472420749
[31] Government Accountability Office (2019). Agencies Need to Develop Modernization Plans for Critical Legacy Systems. GAO-19-471. https://www.gao.gov/products/gao-19-471
[32] McKinsey Global Institute (2024). The State of AI in 2024. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024
[33] Zolas, N., Kroff, Z., Brynjolfsson, E., et al. (2024). “Advanced Technology Adoption and Use by U.S. Firms.” NBER Working Paper. https://www.nber.org/papers/w28290
[34] METR (2025). “Measuring the Impact of Early AI Assistance on Experienced Open-Source Developer Productivity.” Randomized controlled trial. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
[35] Gartner (2025). AI productivity and time savings survey data. https://www.gartner.com/en/newsroom/press-releases/2025-02-05-gartner-survey-supply-chain-genai-productivity-gains-at-individual-level-while-creating-new-complications-for-organizations
[36] DORA (2024). Accelerate State of DevOps Report. Google Cloud. https://dora.dev/research/2024/dora-report/
[37] AWS, Azure, Google Cloud enterprise AI governance documentation (2025).
[38] Agent Behavioral Contracts (ABC) Framework (2026). arXiv preprint. https://arxiv.org/abs/2602.22302
[39] BCG (2025). AI adoption and shadow IT survey data. https://web-assets.bcg.com/fd/0d/bcc5dfae4cbaa08c718b95b16cf5/ai-at-work-2025-slideshow-june-2025-edit-02.pdf
[40] BlackFog, UpGuard, WalkMe (2025). Shadow AI prevalence surveys. https://www.blackfog.com/blackfog-research-shadow-ai-threat-grows/
[41] Atwell, K., et al. (2025). “BASIL: Bayesian Assessment of Sycophancy in LLMs.” https://arxiv.org/abs/2508.16846
[42] Anthropic (2025). “Petri: Automated Multi-Turn Behavioral Auditing Framework.” Open-source release. https://www.anthropic.com/research/petri-open-source-auditing
[43] Kirkpatrick, D.L. & Kirkpatrick, J.D. (2006). Evaluating Training Programs: The Four Levels. Berrett-Koehler Publishers.
[44] Nosek, B.A., et al. (2018). “The Preregistration Revolution.” Proceedings of the National Academy of Sciences, 115(11), 2600-2606. https://doi.org/10.1073/pnas.1708274114