nof1, an artificial intelligence ( AI ) research platform focused on financial markets, launched a large language learning model (LLM) trading test called Alpha Arena on October 0 test had six mainstream AI models (GPT-5, Gemini 2.5 Pro, Grok-4, Claude Sonnet 4.5, DeepSeek V3.1, and Qwen3 Max) use $10,000 in real funds each on the Hyperliquid crypto exchange, with identical prompts and input data. a By the end of the experiment, DeepSeek and Grok had delivered returns of more than 14%, ranking in the top 1 the other end of the spectrum, Gemini 2.5 Pro had lost 42.57%. Alpha Arena trading results.). All “contestants” were trading some of the most popular assets, including Bitcoin ( BTC ), Ethereum ( ETH ), and 2 prompts ensured all models started from the same baseline, instruction-based 3 early leaders, DeepSeek and Grok, took aggressive long positions and capitalized on the ongoing market 4 contrast, ChatGPT and Gemini, which mixed long and short positions, underperformed.
Overall, the Alpha Arena represents the first large-scale, public test of whether AI systems can genuinely interpret and react to live financial markets. A notable observation was that during Bitcoin’s sharp price swings, several models successfully identified and acted on short-term rebound opportunities. Thus, the experiment offers valuable insights into how large language models handle high-uncertainty financial environments. However, it must be pointed out that a $10,000 portfolio and a 48-hour window cannot fully demonstrate long-term performance.
Similarly, the models were not really exposed to extreme market scenarios, leaving their crisis response ability untested. Still, the results have given developers a lot to think about in regard to how AI tools can improve trading efficiency and address human oversight 5 image via Shutterstock
Story Tags

Latest news and analysis from Finbold