From 898a77071eb66fbb5ed55061ae567d27ae3b30d3 Mon Sep 17 00:00:00 2001 From: Qinkai Date: Sat, 6 Jul 2024 14:24:45 +0800 Subject: [PATCH] Update README.md --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 901d97e..6ad5010 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,13 @@ These guides aim to provide a comprehensive understanding and facilitate efficie CodeGeeX4-ALL-9B is ranked as the most powerful model under 10 billion parameters, even surpassing general models several times its size, achieving the best balance between inference performance and model effectiveness. +| **Model** | **Seq Length** | **HumanEval** | **MBPP** | **NCB** | **LCB** | **HumanEvalFIM** | **CRUXEval-O** | +|-----------------------------|----------------|---------------|----------|---------|---------|------------------|----------------| +| Llama3-70B-intruct | 8K | 77.4 | 82.3 | 37.0 | 27.4 | - | - | +| DeepSeek Coder 33B Instruct | 16K | 81.1 | 80.4 | 39.3 | 29.3 | 78.2 | 49.9 | +| Codestral-22B | 32K | 81.1 | 78.2 | 46.0 | 35.3 | 91.6 | 51.3 | +| CodeGeeX4-All-9B | 128K | 82.3 | 75.7 | 40.4 | 28.5 | 85.0 | 47.1 | + CodeGeeX4-ALL-9B scored `48.9` and `40.4` for the `complete` and `instruct` tasks of BigCodeBench, which are the highest scores among models with less than 20 billion parameters. ![BigCodeBench Test Results](./metric/pics/Bigcodebench.PNG) In CRUXEval, a benchmark for testing code reasoning, understanding, and execution capabilities, CodeGeeX4-ALL-9B presented remarkable results with its COT (chain-of-thought) abilities. From easy code generation tasks in HumanEval and MBPP, to very challenging tasks in NaturalCodeBench, CodeGeeX4-ALL-9B also achieved outstanding performance at its scale. It is currently the only code model that supports Function Call capabilities and even achieves a better execution success rate than GPT-4.