You probably already know that Large Language Models (LLMs) are used to power chatbots and generative AI tools for Windows. You also know that some are better than others at getting accurate and reliable responses. But when it comes to Street Fighter III, there is one that stands out from the pack, and that is the (first ever?) SF3 LLM Coliseum. Did you know that the winner just happens to be OpenAI's GPT-3.5?
At last week's Mistral AI Hackathon event in San Francisco, a small team of AI enthusiasts dedicated themselves to finding the ultimate truth about large-scale language models: according to this group, LLM is better suited for such cases than reinforcement learning algorithms According to this group, LLM is better suited to such cases than reinforcement learning algorithms. This is because LLMs do not respond based on accumulated rewards, but are much more contextual.
The way it works is as follows: the LLM is given a text description of the screen and calculates what moves the player will make based on the player's previous moves, the opponent's moves, and both characters' strength bars. Then you can just sit back and watch the two LLMs go at it with each other.
Due to the nature of the event, the first test run threw different versions of the Mistral LLMs into a frantic head-to-head battle, but then the group brought in OpenAI and its GPT-3.5 and GPT-4 models to up the ante.
Fists were exchanged, combos were unleashed, blocks were hammered and dodged. After many battles, the results were tallied and one model stood tall and took the gold medal: the OpenAI GPT-3.5, especially the latest Turbo version. The silver and bronze medals were closely divided, with Mistral-small-2042 taking first place over the GPT-4 preview model.
The source code for this project is available on Github, so you can try it all out yourself without a supercomputer. However, you will need the appropriate game ROM files, which must be from an older 2D beat 'em up or 3D game with limited environment movement.
The potential for this application is obvious, and one wonders when we will see a game where you think you are playing against someone else, but in fact you are just running LLM.
It all looks really cool, but I can't help but wonder if some of the more military minded folks are thinking about what else they could use a large-scale language model for. Especially given GPT-3.5's propensity for thermonuclear explosions in war games.
Hey, we're talking about AI and you haven't mentioned SkyNet once! Oh, tease.
Comments