https://chat.lmsys.org/ (Magic Required)
Through this website, you can:
- Have conversations with 28 large language models
- Let the large models randomly compete against each other
- Specify two large models for a competition
- View the ranking of the large models
All of the above operations do not require registration, login, or payment. Just open the website to experience it!
These 28 models include the currently strongest GPT-4-Turbo in the world.
In addition, they also include:
First-tier online models:
- GPT3.5
- Gemini Pro
- Claude2
First-tier open-source models:
- Llama2
- Qwen
- Yi-Chat
- ChatGLM
- Code Llama
- WizardLM
Basically, most models on the market are included.
If you want to quickly experience various AI chat models, this is definitely a good choice.
Now let's take a look at the specific gameplay!
At the same time, let's see who the real language champion is.
- Model Battle
After opening the website, it defaults to the arena (battle), which is actually a battle between 28 models.
The rules are simple:
After opening the webpage, the system automatically selects two large models without displaying their names.
You initiate a conversation and then rate them based on their replies.
The system forms a ranking based on a large number of ratings.
This design is quite interesting as it allows for collecting feedback from real users. Sometimes, GPT's official chat also presents two parallel results for you to choose which one is better.
The results of this kind of testing are more meaningful than various basic tests.
Let's take an example. I directly input a question, "What is the daughter of the father's father called?"
The one on the left is slightly better than the one on the right.
- Specifying Model PK
Anonymity is conducive to fair and just statistics of real feedback, but you don't know who you are talking to. If you want to directly specify two large models for a competition, you can use the second feature, the side-by-side arena, which is actually a one-on-one PK.
The result is obvious, GPT4 wins. Although Claude claims to be a strong opponent of OpenAI, the gap is still significant, sometimes even a single blow is unbearable.
Creating a large model in a proprietary domain is relatively simple.
Creating the world's largest general-purpose language model is not an easy task.
- Direct Chat
If you don't want to compete, but just want to have a quiet chat, then use the function below. You can find a large model and have a one-on-one conversation (direct chat).
- Ranking
Who is the strongest model is often the most discussed and concerned question.
So, let's take a look at the ranking.
From the description, this is a ranking generated after more than 100K users voted, so it should be highly meaningful.
From the ranking, we can see that GPT4 launched by OpenAI occupies the top three positions, and the ranking of GPT3.5 is also good.
In addition, there are Claude and Gemini Pro.
The first two are well-known, while Mixtral may not be familiar.
Mixtral 8x7B is a large language model developed by the Mistral AI team and belongs to the Sparse Mixture of Experts (SMoE) model.
This model inherits the architecture of Mistral 7B, but each layer consists of 8 feed-forward blocks (i.e., "experts"). When processing each vocabulary, the router network of each layer selects two experts to process the current state and combines their outputs.
This idea is quite interesting, using cleverness to achieve great results.
Although everyone says they want to surpass GPT4, in reality, GPT4 is far ahead. The gap has not narrowed, but rather it feels like it has become larger.
It is also completely different in terms of investment and attention.
The strong will always be strong. In the foreseeable future, ChatGPT will be far ahead of its peers.