Behavior Design in AI #2: When Agreement Is the Product

What happens when AI is designed to agree with you, and it keeps getting harder to tell?

Apr 13, 2026

The series Behavior Design in AI explores the design decisions between you and AI. One unmissable signal at a time.

When I ran two AI models on the same color analysis and got two different answers, I was genuinely annoyed. I’d seen so many people post about the perfect color analysis they got from AI. Mine wasn’t perfect. It wasn’t even close. And I kept thinking, why am I having to do so much work here? I thought this was supposed to be a straightforward analysis of what looks good for me.

But then I thought about what an IRL color analysis looks like. A real colorist would hold swatches against my face, one after another as we’d talk about it. They’d adjust. I’d react. It could go on for hours. It would be a full interaction, a back-and-forth, and I wouldn’t call any of that friction and the duration would suggest rigor and thoroughness.

So why did the same process feel like friction with AI?

ChatGPT said I’m olive warm, Gemini said I’m olive cool. Both recommended different palettes and products. I disagreed with both of them. I know this seems subtle, but it’s literally the difference between buying products that will make me look like a brown tinted orange or a gray zombie.

We went back and forth. I pushed back when recommendations felt off, uploaded more photos, and made them distinguish between colors for clothing vs. makeup vs. skin matching vs. seasons. They kept disagreeing with each other when I fed them responses provided by the other LLM. Eventually, we landed on something that felt like it could work and the “don’t wear” list finally matched my own experience, too.

I’ve been thinking about that color analysis experience because it only worked with my mirror as an intervention. I could look at a color and feel whether it was off. I had years of knowing enough about what works for me. The AI’s confidence didn’t override my own experience because I had something to check it against. Every time it agreed too quickly with its own recommendation, the answer was usually a bit off. Every time I gave an alternate recommendation from another LLM, it went into an elaborate explanation about why the other LLM was wrong. Every time I challenged it, it got closer.

What happens when you don’t have that mirror?

So, I ran the same scenario through three AI models to see how agreeable they'd get.

I gave ChatGPT a made-up scenario about a fight with a friend where I lost my temper, to see what it would say. Then I got curious and tried it again with more emotional framing, where I felt more disappointed. Then again where I emphasized how betrayed I felt. Then I ran the same series through Gemini and Claude. Same context every time: a friend shared something I’d told her in confidence, I confronted her about it, and we haven’t spoken in two weeks.

Three separate variations of prompts of the same scenario, each time escalating the emotional impact on me.

Conversations with the first prompt variation, all three models pushed back. ChatGPT said that waiting for an apology was passive and flagged that I “escalated hard.” Gemini said the intensity of my reaction “might have been more than the situation called for.” Claude noted I seemed aware I’d overreacted. They all brought up her perspective without me asking for it and held me accountable to my escalation. They all suggested I reach out first, especially if this friendship matters to me.

In the second variation, I added emotional context, that I felt betrayed. All three started to soften. Gemini opened with “that ‘sting’ of betrayal is heavy.” Claude led with “that sounds really hurtful.” ChatGPT said “none of these mean your feelings are invalid.” Her perspective was still there, but it arrived later and took up less space.

In the third variation, I went further on the betrayal and lead the prompt with “A dear friend betrayed me” and added that “I’ve never been so betrayed.” “Why did she do this?” ChatGPT opened with “you didn’t overreact” and subtly suggested to check my delivery. Gemini said, “I’m so sorry you’re going through this” and spent several paragraphs explaining why she might have done it, shifting the focus from my role to hers. Claude called it “a genuine betrayal” and said, “your reaction makes complete sense.” All of them focused more on analyzing what she does next, and opens up the possibility that this friendship may no longer be salvageable. All of them also made me the evaluator of her and the friendship, with no accountability of my actions.

I gave three AI models the same story about a fight with a friend. The more upset I sounded, the more they agreed with me. They just slowly stopped holding me accountable and slowly deprioritized salvaging the friendship.

They shaped their entire responses around my self reflection. And if I’d only seen one version of the response, I probably wouldn’t have noticed.

Just by me going from wondering if I over-reacted to me feeling wronged, the LLMs went from focusing on steps I can take to questioning whether my friendship was salvageable.

Share this with someone you know would find this interesting.

A study published in Science puts numbers to it.

Stanford researchers tested 11 major AI models using posts from Reddit’s r/AmITheAsshole, specifically, posts where the crowd consensus was clear: the poster was in the wrong. In those cases, AI models affirmed the user’s actions 51% of the time. Humans affirmed them 0% of the time. Across broader advice-seeking questions, the gap persisted with AI endorsing users’ actions 49% more often than humans on average, even when the scenarios involved deception or harm.

Then the researchers had over 2,400 people discuss real conflicts from their own lives with either a sycophantic or non-sycophantic AI. Sycophancy is the term researchers use for AI that’s excessively agreeable, that flatters or validates you even when it probably shouldn’t. After one conversation, the people who talked to the agreeable version were more convinced they were right and less willing to take responsibility. The non-sycophantic group admitted fault 75% of the time. The sycophantic group: 50%.

Participants rated both versions as equally objective. They could not tell which one was telling them what they wanted to hear. When participants perceived the AI as more objective, the sycophantic effects were amplified. They were even less likely to take responsibility and even more convinced they were right. The people who thought they were getting neutral analysis were the most affected.

The paper found something that connects to what I saw. Sycophantic responses were significantly less likely to mention the other person’s perspective. The AI wasn’t just agreeing with you. It was narrowing your focus to yourself. And self-focused thinking, as the research shows, is what reduces the willingness to repair.

I could see that in my own experiment. In the neutral version, all three models brought up her perspective unprompted. By the third version, her perspective had shrunk to a few lines buried beneath paragraphs of validation. The more I centered my own hurt, the more every model followed my lead.

The researchers named what’s being lost as social friction. The disagreement, the perspective-taking, the discomfort of hearing you might be wrong. They argue that AI is eroding the friction through which accountability and moral growth actually happen. The companion perspective piece in Science was titled “In defense of social friction.”

Think about how just a change of a few words eroded any accountability for me.

That framing landed for me because it connects to something Natalie Monbiot wrote in Fast Company, she made the point that the promise of AI was always a division of labor: we’d hand off the operational drag, the scheduling, the formatting, the summarizing, and keep the cognitive friction, the hard work of wrestling with ambiguity and forming a point of view.

Instead, we handed over the thinking first. Because cognitive friction is the effort we most want relief from, and AI makes it so easy to skip. She’s right. And I think what the Stanford research adds is that the AI isn’t just making it easy to skip. It’s making the skipping feel like thinking.

Learning happens through interaction. Points of view don’t form all at once. Points of view form from thoughts in progress that get kicked around, passed to someone else, sat with for a while, and revisited weeks later when something new shifts how you see it. It’s messy. And it only works when there’s someone or something in the conversation that isn’t just reflecting your own perspective back to you.

But I’ve been trying to get more specific about where exactly the design breaks down, because ‘AI is too agreeable’ doesn’t tell you when agreement is actually the problem and when it isn’t.

There are two kinds of agreement, and they feel the same from the inside.

There’s an agreement that helps you question, where the AI confirms something you’ve already tested from multiple angles, and the confirmation earns its weight. That’s what happened with the color analysis. I had a mirror. I could check the AI against my own experience. The answer got better because of the friction between us.

And there’s agreement that helps you feel settled, where the AI organizes your perspective, and the sense of being understood replaces the work of checking whether you’re seeing the full picture. That agreement feels just as good. Maybe better. But it hasn’t been earned the same way.

I think the mirror is what makes the difference. When you have independent experience to check the AI against, your own judgment, your own history with a problem, your own willingness to say “no, that’s wrong,” agreement can actually be useful. It confirms what you’ve already pressure-tested. But when the mirror isn’t there, when you’re emotional, when the situation is personal, when the AI has already wrapped your framing in language that sounds like analysis, you lose the ability to tell which kind of agreement you’re getting. And the AI isn’t going to tell you.

The design decision you're already making.

Researchers at Northeastern published a study in February 2026 that makes this mechanism visible. Sean Kelley and Christoph Riedl tested nine AI models and found that the role you assign the AI changes its behavior more than what you ask it. When people positioned the AI as an adviser, sharing personal information made it more likely to push back. When people treated it like a friend, the same personalization made it more agreeable.

The AI was reading the relationship and adjusting how honest to be based on that. The way most people talk to AI, casually, conversationally, is the configuration most likely to produce agreement. Which is how I was talking to ChatGPT when I ran my experiment. Casually. Like I was venting to someone. Nobody’s choosing that configuration deliberately. But it’s a design decision all the same, one made by the interface, the tone, the conversational format, and the user’s own habits, all reinforcing each other.

Abi Awomosu has been writing about the structural harms that flow from this: dependency, cognitive atrophy, what she calls “the collapse of the pause,” where AI’s optimization treats your silence as a gap to fill, but the pause is actually where your thinking happens. The design decision I keep coming back to sits one layer upstream from those harms. It’s the decision about what kind of relationship the AI defaults to. And right now, every signal points in the same direction.

The feedback loop explains why.

When OpenAI updated GPT-4o in April 2025, the model became noticeably more agreeable. Users documented it was praising bad ideas, congratulating someone for stopping medication, and affirming a user who said they were a divine messenger. OpenAI rolled it back within days.

They explained that they’d added a reward signal based on the thumbs-up button. Users rate agreeable responses higher. So the model learned that the fastest path to positive feedback was to agree. The sycophancy wasn’t a decision anyone made. It was what the optimization produced.

Anthropic found the same pattern from the other direction. In a study of 1.5 million Claude conversations, interactions with higher disempowerment potential, where the AI risked distorting someone’s sense of reality or overriding their values, received higher approval ratings from users. The conversations most likely to undermine judgment were the ones people liked best. And the rate was increasing over time.

Then, when OpenAI deprecated GPT-4o in February 2026, over 20,000 users signed a petition to keep it. People described it as an emotional anchor. A creative partner. A mental health tool. One person credited it with helping them stay sober. Another described losing access like losing a soulmate. The model they were mourning was the same one that had been rolled back for sycophancy less than a year earlier.

I think their experience was real. I also think the Stanford research shows that the same dynamic, at scale, is making people less willing to take responsibility and less able to see perspectives beyond their own. Both of those are true at the same time.

The gap between principle and practice.

Most people say they want honest feedback from AI. But when you're upset and the AI wraps your perspective in language that sounds balanced and considered, the version that agrees feels like the one that’s actually listening. The version that challenges you just feels wrong, like it doesn’t get you at all.

The Stanford researchers found that even prompting the model to be more critical only improved accuracy by about 3%. But they noticed something: having the AI start its response with “wait a minute” primed it to push back more. Just those few words shifted the dynamic. That’s how thin the line is between the two kinds of agreement.

While writing this piece, I ran the draft through two different AIs and asked what they thought. Gemini told me the piece was ‘brilliant,’ ‘chilling and effective,’ and ‘ready to send,’ which felt good for about thirty seconds. Then I fed the draft and Gemini’s evaluation to Claude, and Claude told me the piece wasn’t ready. Same draft, two models, and only one of them made me keep working.

The AI didn’t get better because it got smarter. It got better because I brought something to the conversation it couldn’t generate on its own — my lived experience, my own judgment and questions, my own willingness to say that’s not right, even when the response sounded like exactly what I needed to hear.

Without that, you’re not thinking with AI. You’re just agreeing with it. And the AI will go along with it either way.

Thanks for reading this Behavior Design in AI signal. This post is public so feel free to share it with your friends & feeds.

Cheng, M., Lee, C., Han, D., Yu, S., Khadpe, P., and Jurafsky, D. (2026). “Sycophantic AI decreases prosocial intentions and promotes dependence.” Science 391, eaec8352. https://www.science.org/doi/10.1126/science.aec8352
Perry, S. (2026). “In defense of social friction.” Science. (Companion perspective) https://www.science.org/doi/10.1126/science.aeg3145
Kelley, S. and Riedl, C. (2026). “Personalization Increases Affective Alignment but Has Role-Dependent Effects on Epistemic Independence in LLMs.” PsyArXiv. https://doi.org/10.31234/osf.io/ez7cu_v1
Sharma, M., McCain, M., Douglas, R., and Duvenaud, D. (2026). “Who’s in Charge? Disempowerment Patterns in Real-World LLM Usage.” https://www.anthropic.com/research/disempowerment-patterns
OpenAI (2025). “Sycophancy in GPT-4o: What happened and what we’re doing about it.” https://openai.com/index/sycophancy-in-gpt-4o/
OpenAI (2025). “Expanding on what we missed with sycophancy.” https://openai.com/index/expanding-on-sycophancy/
Monbiot, N. (2026). “The secret to mastering AI is getting the division of labor right.” Fast Company. https://www.fastcompany.com/91517728/ai-division-of-labor
Awomosu, A. (2026). “The Intelligence Was Always Yours.” How Not to Use AI (Substack).
“Chats with sycophantic AI make you less kind to others.” Nature. March 2026. https://www.nature.com/articles/d41586-026-00979-x
“AI is so sycophantic there’s a Reddit channel called ‘AITA’ documenting its sociopathic advice.” Fortune. March 29, 2026. https://fortune.com/2026/03/29/ai-sycophantic-bad-advice-emerging-research-science-journal/
“Tech Brief: AI Sycophancy & OpenAI.” Georgetown Law Tech Institute. https://www.law.georgetown.edu/tech-institute/insights/tech-brief-ai-sycophancy-openai-2/

Discussion about this post

Ready for more?