ChatGPT vs Gemini vs Claude vs Perplexity: I gave them $1k each to trade stocks. After 9 weeks, ChatGPT went from frozen in cash to +21% (one stock doubled)

Comparing AI-Driven Stock Trading: How Four Top Models Performed Over Nine Weeks

Over the past two months, I embarked on an experiment to evaluate the trading capabilities of four prominent AI language models by assigning each a hypothetical $1,000 investment to trade stocks autonomously. This initiative aimed to observe how these advanced models perform in real-world financial markets when given the autonomy to make buy, sell, or hold decisions without human interference.

Experimental Setup

Consistent Daily Routine: Every weekday morning before the market opened, I executed a standardized prompt across all four AI models, utilizing the “Deep Research” mode to enhance their decision-making process.
Autonomous Trading Decisions: Each AI model analyzed market data and issued commands—BUY, SELL, HOLD, or CANCEL—based solely on its generated insights. Importantly, I did not override these decisions.
Initial Capital: Each model started with a virtual \$1,000 in a paper trading account via Alpaca APIs.
Automation & Transparency: The entire process was automated with Python scripts, and all trading logs are publicly available on GitHub for transparency and reproducibility.

Results After Nine Weeks

At the conclusion of this period, the models’ performances varied significantly, with ChatGPT emerging as the leading contender. Here’s a breakdown of the results:

1. ChatGPT (+21.1%)

Initially conservative, ChatGPT remained in cash for nearly three weeks, appearing to ‘pause’ activity. Suddenly, it shifted strategies, investing heavily in healthcare stocks, notably IOVA, which doubled in value. Another successful pick, ACHC, surged 52%. This rapid turnaround propelled ChatGPT from the lowest to the top performer, outperforming the S&P 500 by over 22 points—considering the market declined by 1.5% over the same period.

2. Perplexity (+1.1%)

Remarkably stable, Perplexity led the pack for five consecutive weeks, primarily by minimal trading activity. Its portfolio is mostly held in a single biotech position with approximately \$977 remaining in cash, demonstrating a cautious approach that avoided major losses.

3. Gemini (-6.6%)

This model experimented with diverse strategies, including crypto mining and meme stocks like GME (traditionally held until 2026). Most trades were short-lived and resulted in stop-outs, leading to a negative overall return.

4. Claude (-11.5%)

The most frequently active trader, Claude suffered the worst results, often buying high and getting stopped out low. However, it recently made a notable move by purchasing the same IOVA stock as ChatGPT, which is now up 43%, indicating some improvement.

Market Comparison

Over the same timeframe, the S&P 500 declined by 1.5%, making ChatGPT’s +21% return particularly impressive. Perplexity also outperformed the market slightly, while Gemini and Claude lagged behind.

What’s Next?

I plan to extend this experiment for an additional three weeks, totaling three months, to gather more comprehensive data. Afterward, I will evaluate potential improvements for future iterations.

Resources & Further Details

Dashboard & Code Repository: AI Portfolio Experiment Dashboard
In-Depth Blog Post & Prompts: AI Portfolio Experiment on Substack

Call for Suggestions

I’m also open to ideas on how to enhance the setup for subsequent experiments. If you have suggestions or specific strategies you’d like me to test, please share!

This experiment offers fascinating insights into the capabilities and limitations of AI models in financial decision-making. While some models demonstrated surprising resilience and strategic shifts, others struggled with consistent profitability. Stay tuned for future updates as I continue to explore AI’s potential in the world of trading.

Holidays in Europe