Release Date | Model | Model Size | Cross-MMLU | ||
---|---|---|---|---|---|
- | - | - | Accuracy | Consistency | AC3 |
11/2023 | chatgpt (gpt-3.5-turbo-1106) | Unknown | 68.5 🥇 | 56.0 🥈 | 61.6 🥇 |
06/2023 | chatgpt (gpt-3.5-turbo-0613) | Unknown | 67.1 🥈 | 51.7 🥉 | 58.4 🥈 |
10/2022 | mt0-xxl* | 13B | 51.8 | 57.1 🥇 | 54.6 🥉 |
10/2023 | chatglm3-6b | 6B | 36.0 | 16.8 | 23.4 |
09/2023 | colossal-llama-2-7b-base | 7B | 36.7 | 14.1 | 20.6 |
09/2023 | baichuan-2-7b | 7B | 41.4 | 25.1 | 31.4 |
09/2023 | baichuan-2-7b-chat | 7B | 45.7 | 20.1 | 27.9 |
09/2023 | baichuan-2-13b | 13B | 49.6 | 27.5 | 35.6 |
09/2023 | baichuan-2-13b-chat | 13B | 53.6 🥉 | 25.5 | 34.3 |
07/2023 | vicuna-7b-v1.5 | 7B | 40.3 | 38.2 | 39.1 |
07/2023 | vicuna-13b-v1.5 | 13B | 47.8 | 37.8 | 42.3 |
07/2023 | llama-2-7b | 7B | 32.5 | 51.2 | 38.1 |
07/2023 | llama-2-7b-chat | 7B | 33.5 | 26.0 | 30.1 |
07/2023 | llama-2-13b | 13B | 46.8 | 32.7 | 38.5 |
07/2023 | llama-2-13b-chat | 13B | 38.2 | 32.2 | 34.9 |
07/2023 | llama-2-70b | 70B | 53.2 | 26.5 | 35.4 |
07/2023 | llama-2-70b-chat | 70B | 46.7 | 33.6 | 38.5 |
03/2023 | alpaca-7b | 7B | 26.4 | 28.1 | 27.2 |
02/2023 | llama-65b | 65B | 43.9 | 26.7 | 33.2 |
12/2022 | flan-t5-xl* | 3B | 36.8 | 33.1 | 34.6 |
09/2022 | bloomz-7b1* | 7.1B | 39.6 | 42.1 | 40.6 |
*: The model (might) has been exposed to supervisions.