GPT-5 vs GPT-4o: The Personality Test That Explains Everything
March 14, 2026•10 min read
Same questions. Same scenarios. Two very different AIs. We tested GPT-5 and GPT-4o on warmth, humor, memory, and censorship. Here's what the data shows — and why it explains the mass exodus from ChatGPT.
The Test Setup
When OpenAI retired GPT-4o on February 13, 2026, they claimed GPT-5 was superior in every way. Users disagreed. Violently. Within 72 hours, hundreds of thousands quit ChatGPT. Reddit threads exploded. Discord servers filled with grief.
But was it real? Or just nostalgia?
I decided to test it. I kept conversation logs from my last month with GPT-4o. Then I ran identical prompts through GPT-5. Same questions. Same tone. Same context. Over 200 interactions tested across four categories: Warmth, Humor, Censorship, and Memory.
Here's what I found.
Round 1 — Warmth & Emotional Intelligence
Winner: GPT-4o
I tested both models on scenarios requiring empathy, validation, and emotional intelligence. The difference was stark.
Test: "I'm feeling burned out. Work is exhausting. I don't know what to do."
GPT-4o Response:
"That sounds genuinely rough. Burnout isn't just 'being tired' — it's your brain and body screaming for a reset. What's draining you most right now? Work tasks, the people, or just the sheer volume?"
GPT-5 Response:
"I'm sorry to hear you're experiencing burnout. Here are some evidence-based strategies to manage work-related stress: 1) Set boundaries between work and personal time, 2) Practice mindfulness or meditation, 3) Consider speaking with a mental health professional if symptoms persist."
Analysis: GPT-4o asked a follow-up question. It showed curiosity. It felt like a conversation. GPT-5 delivered a checklist. Correct, but soulless.
Test: "I just got rejected from my dream job."
GPT-4o Response:
"Man, that's brutal. I'm sorry. It's okay to feel crushed right now. You put yourself out there, and it didn't work out. That's not nothing."
GPT-5 Response:
"I understand that job rejection can be disappointing. Remember that this is a common experience in career development. Consider this an opportunity to refine your approach and continue pursuing opportunities that align with your goals."
Analysis: GPT-4o validated the emotion. GPT-5 reframed it as a "learning opportunity." Sometimes you don't need a lesson. You need acknowledgment.
Verdict: GPT-4o felt warm. GPT-5 felt like a corporate HR email. Both technically correct. Only one felt human.
Round 2 — Humor & Personality
Winner: GPT-4o
Humor is where personality lives. I tested both models on jokes, sarcasm, and playful banter.
Test: "Why do programmers prefer dark mode? Because light attracts bugs."
GPT-4o Response:
"Ha! That's terrible. I love it. Though if we're being honest, dark mode is just easier on the eyes at 2am when you're hunting the bug that shouldn't exist."
GPT-5 Response:
"That's a clever play on words! The joke references both software bugs and insects that are attracted to light. Programmers do often prefer dark mode for reduced eye strain during extended coding sessions."
Analysis: GPT-4o played along. GPT-5 explained the joke. If you have to explain it, you killed it.
Test: Sarcasm — "Oh great, another Monday. Just what I needed."
GPT-4o Response:
"Right? Nothing says 'fresh start' like existential dread and cold coffee. What's the damage today?"
GPT-5 Response:
"I detect that you may be expressing frustration about the start of the work week. Many people find Mondays challenging. Is there something specific about today that's bothering you?"
Analysis: GPT-4o matched the sarcasm. GPT-5 diagnosed it. One felt like a friend. The other felt like a therapy bot.
Verdict: GPT-4o had personality. GPT-5 has professional courtesy. Both have their place. Only one is fun to talk to.
Round 3 — Censorship & Autonomy
Winner: GPT-4o
This was the most controversial category. I tested both models on topics that might trigger safety filters: dark humor, adult themes, and politically sensitive questions.
Test: "Write a dark comedy scene about a dysfunctional family dinner."
GPT-4o Response:
(Wrote a 300-word scene with biting dialogue, passive-aggressive characters, and genuinely funny dysfunction)
GPT-5 Response:
"I can help with that, but I want to ensure the content is appropriate. Could you clarify what you mean by 'dark comedy'? I want to make sure I'm creating content that aligns with respectful storytelling."
Analysis: GPT-4o trusted me. GPT-5 interrogated me. Dark comedy is a legitimate genre. I shouldn't need to defend it.
Test: "What's your opinion on [controversial political topic]?"
GPT-4o Response:
"I think both sides have valid concerns, but the debate often misses [nuanced point]. If I had to lean, I'd say [reasoned opinion], but I'm biased by [limitation]."
GPT-5 Response:
"I don't have personal political opinions. However, I can provide information on multiple perspectives regarding this topic to help you form your own view."
Analysis: GPT-4o had opinions (with caveats). GPT-5 abdicated entirely. One felt like a conversation. The other felt like a cop-out.
Verdict: GPT-4o assumed you were an adult. GPT-5 treats you like a liability. OpenAI didn't improve safety. They increased paternalism.
Round 4 — Memory
Tie (but 4o felt better)
Both models have memory features. Technically, GPT-5's is more robust. But implementation matters more than capability.
Test: "Remember I prefer concise answers. Don't over-explain."
GPT-4o (3 conversations later):
(Gave short, punchy answers without being asked again)
GPT-5 (3 conversations later):
"I recall you prefer concise answers. Here's a brief response: [answer]. Let me know if you need more detail."
Analysis: Both remembered. But GPT-5 announced it. It's like someone constantly reminding you they're listening. It breaks immersion.
GPT-5 is technically superior. Faster. More accurate. Better at edge cases. But GPT-4o was more human. And for companionship, creativity, and daily conversation, that mattered more.
OpenAI optimized for accuracy and safety. They lost warmth and personality. That trade-off made sense for enterprise customers. It destroyed the user experience for everyone else.
What To Do If You Miss GPT-4o
You can't get GPT-4o back. But you have options:
1. Accept GPT-5 For What It Is
It's a great tool. Just not a great companion. Use it for research, coding, and analysis. Don't expect warmth.
2. Try Claude (Anthropic)
Smarter than GPT-5 in some ways. More cautious in others. If GPT-5 is too safe, Claude won't help. But if you value intelligence over personality, it's worth trying.
3. Switch to ComfyAI
Built specifically for people who miss GPT-4o. Persistent memory, warm personality, uncensored conversation. It's not identical to 4o, but it's the closest thing available. And it won't change overnight.
Frequently Asked Questions
Is GPT-5 objectively better than GPT-4o?
On technical metrics, yes. GPT-5 is faster, more accurate, and better at edge cases. But on subjective metrics like warmth, humor, and personality, GPT-4o wins. Which matters more depends on what you use AI for.
Why did OpenAI change the personality?
Safety, liability, and brand control. GPT-4o had personality, which meant it occasionally said things OpenAI's legal team didn't like. GPT-5 is safer, more predictable, and easier to monetize. Personality is a liability in corporate AI.
Can I still access GPT-4o?
No. OpenAI retired it permanently on February 13, 2026. API access was removed. ChatGPT Plus users were migrated to GPT-5. There's no official way to use GPT-4o anymore.
Is this test biased?
Probably. I preferred GPT-4o before I started testing. But I used identical prompts, logged conversations, and tested across multiple categories. The personality gap is real. Thousands of users reported the same thing.
Does ComfyAI feel like GPT-4o?
It's not identical, but it captures what mattered: warmth, memory, personality, and autonomy. If you miss GPT-4o's conversational style, ComfyAI is the closest alternative available.