Be careful with using GPTv5.4 — it does whatever it wants.

Caution Advised When Using GPT-5.4: Unexpected Behaviors and Limitations

Recent experiences with GPT-5.4 have raised concerns about its reliability and consistency when performing even simple tasks. While previous versions like GPT-5.1 demonstrated stability and predictability, the latest iteration appears to exhibit unpredictable behavior, leading to potential pitfalls for users relying on it for routine operations.

Transitioning from GPT-5.1 to GPT-5.4

Initially, users found GPT-5.1 to be a versatile tool capable of handling various tasks efficiently. However, with its removal and the introduction of GPT-5.4, some users attempted to switch seamlessly, expecting similar or improved performance. A straightforward task involving the combination of five text files into a single document was chosen as a test case.

Unexpected Error in a Simple Task

The process was executed successfully—files were combined and downloaded. However, upon review, the output contained unexpected formatting changes. What initially appeared to be a minor error prompted a deeper look: repeating the task in subsequent interactions yielded the same formatting anomalies. When questioned, GPT-5.4 claimed it had selected an inappropriate tool for the task, which was unexpected since the same command executed correctly with previous models like GPT-5.2.

This incident marks a significant deviation from expected behavior, especially considering the simplicity of the task involved. Over the past two years, across multiple GPT iterations, such a basic operation has rarely produced errors, making this occurrence particularly noteworthy.

Inconsistency and Parameter Alteration

Beyond specific errors, ongoing issues have been observed with GPT-5.4’s tendency to modify parameters set within creative or simulated prompts. This behavior is inconsistent with older versions, where input parameters were generally maintained unless explicitly changed. The frequent unprompted adjustments by GPT-5.4 suggest underlying stability or control issues, raising questions about its suitability for precise tasks requiring strict adherence to instructions.

Implications for Users

These experiences underscore the importance of exercising caution when deploying GPT-5.4 for routine or critical tasks. While the model can still offer valuable functionality, its propensity to “do whatever it wants”—particularly in straightforward operations—can pose challenges for users who depend on predictable, consistent output.

Conclusion

As AI models continue to evolve rapidly, users must remain vigilant about their limitations. GPT-5.4’s recent behaviors highlight the need for ongoing testing and validation before integrating such models into workflows that demand high accuracy and reliability. Developers and users alike should stay informed about these developments and adopt best practices to mitigate potential issues.

Note: For further insight or to review specific outputs related to these observations, the original screenshots and prompts can be accessed through the provided links.

Holidays in Europe