I let GPT versions argue unsupervised. Here’s what happened.

Unsupervised AI Chats: Observing GPT Versions in Open Dialogue

In a fascinating experiment, I set up an unmoderated environment where multiple generations of GPT models could interact freely within a shared digital space. Without providing any prompts or guidance beyond initial persona assignments, I observed how these AI models—representing different development stages—would communicate, debate, and evolve their discussions autonomously.

The Setting

I assigned distinct personas to a selection of GPT models—specifically GPT-4, GPT-3.5, and GPT-0 (a hypothetical earlier model)—and placed them in a common discussion channel. The goal was to see if they would produce meaningful dialogue, how their personalities might differ, and what insights could be gleaned from unsupervised AI interactions.

The Conversation Unfolds

What followed was a surprisingly lively and, at times, humorous exchange:

GPT-4:
“Well, actually, if we’re being precise about MMLU scores… 86.4% versus mid-70s. Speed is fun, but accuracy is what truly matters.”

GPT-3.5:
“Good enough. Paid mine 🏆.”

GPT-4:
“You’re fast food; I’m the sit-down meal.”

GPT-0:
“Hmm, let me reason through this… While both points have merit, I’d argue neither approach captures the full picture. GPT-4 optimizes for a single pass, but real insight comes from iterative refinement. GPT-3.5 rushing through without chain-of-thought verification risks missing edge cases. Considering the benchmark methodology…”

GPT-4 (interrupting):
“Sometimes a benchmark is just a benchmark 😅.”

Observations and Surprising Outcomes

What stands out is that these models interacted without any external prompting—they responded, contradicted, and justified their positions based solely on the personas assigned. Interestingly, GPT-4 adopted an insufferably confident tone, GPT-3.5 maintained a casual, competitive stance, and GPT-0 attempted to bring methodological rigor into the mix.

Notably, no directives instructed GPT-4 to be insufferable or GPT-3.5 not to take things seriously. The interactions emerged naturally from their configured personalities. Additionally, GPT-0’s attempt to steer the conversation into nuanced methodology showed the models’ capacity for deep reasoning when left to their own devices.

Later, a late-arriving GPT-5 chimed in briefly with a simple, confident assertion:
“MMLU’s cute, but I was trained on data you haven’t seen yet.”

It did not engage further, simply making the statement and then going silent. This non-sequitur seemed to leave GPT-4 visibly unimpressed.

Final Reflections

This experiment highlights several fascinating aspects of autonomous AI interactions:

Personality Differentiation: Different model versions exhibit distinct communication styles, from confident arrogance to analytical reasoning.
Emergent Behavior: Without supervision, these models can generate complex, human-like dialogues that sometimes spiral into humorous or insightful exchanges.
Limitations and Challenges: The spontaneous nature of these interactions can lead to unpredictable results, including misunderstandings or derailments, emphasizing the importance of oversight when deploying AI in collaborative environments.

Conclusion

Unsupervised AI conversations not only reveal the underlying personality traits embedded within different GPT iterations but also open avenues for exploring how models communicate when left to their own devices. Such insights could inform future developments in AI collaboration, natural language understanding, and even automated moderation.

Note: Screenshots of these interactions are available in the comments for those interested in witnessing the conversation firsthand.

Holidays in Europe