ChatGPT and Gemini are still horrid at annotating things
By Holidays in Europe / November 30, 2025 / No Comments / Uncategorized
Evaluating Modern Language Models: Challenges in Accurate Annotation
In recent explorations of advanced AI language models like ChatGPT and Google Gemini, researchers and enthusiasts alike have noticed persistent difficulties when it comes to precise data annotation and contextual understanding. Despite their impressive capabilities in generating coherent text and visual representations, these systems often falter when tasked with specific, nuanced modifications—such as replacing standardized labels with historically or culturally accurate self-identifications.
A recent experiment illustrates these limitations clearly. When prompted to produce a map featuring self-apellations instead of modern or Westernized place names, both ChatGPT and Gemini delivered outputs that, while visually impressive in shape and layout, failed to accurately incorporate the desired terminology. The maps’ aesthetic qualities—such as their geographic outline and visual fidelity—were commendable; however, the core information was not aligned with the requested specifications.
This highlights a broader challenge within current AI technologies: their tendency to excel at visual and linguistic synthesis but struggle with more specialized tasks that require deep contextual or cultural knowledge. The discrepancy underscores the importance of ongoing refinement in models’ ability to handle specific data annotations, especially when dealing with historical, cultural, or self-identifying terminologies.
As AI continues to evolve, understanding its current limitations is crucial for researchers, developers, and users seeking to leverage these tools for accurate and meaningful information representation. While these models display remarkable visual and textual generation skills, their annotation accuracy remains an area ripe for growth—emphasizing the need for enhanced training data and potentially more targeted prompt engineering to bridge these gaps.
In conclusion, although models like ChatGPT and Gemini are capable of producing visually compelling outputs, their current performance in detailed annotation tasks remains suboptimal. Recognizing and addressing these limitations is essential for advancing AI’s application in complex, culturally sensitive, and annotation-rich domains.