Update README.md

This commit is contained in:
Diego ROJAS 2024-07-05 14:47:39 +08:00
parent 511d6a5a5a
commit 095e6fe9cf
2 changed files with 6 additions and 6 deletions

View File

@ -55,7 +55,7 @@ These guides aim to provide a comprehensive understanding and facilitate efficie
CodeGeeX4-ALL-9B is ranked as the most powerful model under 10 billion parameters, even surpassing general models several times its size, achieving the best balance between inference performance and model effectiveness. CodeGeeX4-ALL-9B is ranked as the most powerful model under 10 billion parameters, even surpassing general models several times its size, achieving the best balance between inference performance and model effectiveness.
CodeGeeX4-ALL-9B scored `48.9` and `40.4` for the `complete` and `instruct` tasks of BigCodeBench, which are the highest scores among models with less than 20 billion parameters. CodeGeeX4-ALL-9B scored `48.9` and `40.4` for the `complete` and `instruct` tasks of BigCodeBench, which are the highest scores among models with less than 20 billion parameters.
![BigCodeBench Test Results](./metric/pics/Bigcodebench.png) ![BigCodeBench Test Results](./metric/pics/Bigcodebench.PNG)
In CRUXEval, a benchmark for testing code reasoning, understanding, and execution capabilities, CodeGeeX4-ALL-9B presented remarkable results with its COT (chain-of-thought) abilities. From easy code generation tasks in HumanEval and MBPP, to very challenging tasks in NaturalCodeBench, CodeGeeX4-ALL-9B also achieved outstanding performance at its scale. It is currently the only code model that supports Function Call capabilities and even achieves a better execution success rate than GPT-4. In CRUXEval, a benchmark for testing code reasoning, understanding, and execution capabilities, CodeGeeX4-ALL-9B presented remarkable results with its COT (chain-of-thought) abilities. From easy code generation tasks in HumanEval and MBPP, to very challenging tasks in NaturalCodeBench, CodeGeeX4-ALL-9B also achieved outstanding performance at its scale. It is currently the only code model that supports Function Call capabilities and even achieves a better execution success rate than GPT-4.
![Function Call Evaluation](./metric/pics/FunctionCall.png) ![Function Call Evaluation](./metric/pics/FunctionCall.png)
Furthermore, in the "Code Needle In A Haystack" (NIAH) evaluation, the CodeGeeX4-ALL-9B model demonstrated its ability to retrieve code within contexts up to 128K, achieving a 100% retrieval accuracy in all python scripts. Furthermore, in the "Code Needle In A Haystack" (NIAH) evaluation, the CodeGeeX4-ALL-9B model demonstrated its ability to retrieve code within contexts up to 128K, achieving a 100% retrieval accuracy in all python scripts.
@ -77,7 +77,7 @@ The code in this repository is open source under the [Apache-2.0](https://www.ap
If you find our work helpful, please feel free to cite the following paper: If you find our work helpful, please feel free to cite the following paper:
``` ```bibtex
@inproceedings{zheng2023codegeex, @inproceedings{zheng2023codegeex,
title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X}, title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X},
author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang}, author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang},

View File

@ -56,7 +56,7 @@ with torch.no_grad():
CodeGeeX4-ALL-9B 被评为参数量100 亿内的最强模型,甚至超越了参数量大几倍的通用模型,在推理性能和模型能力之间达到了最佳效果。 CodeGeeX4-ALL-9B 被评为参数量100 亿内的最强模型,甚至超越了参数量大几倍的通用模型,在推理性能和模型能力之间达到了最佳效果。
在 BigCodeBench 的 complete 和 instruct 任务中CodeGeeX4-ALL-9B 分别取得了 `48.9``40.4` 的高分,这在参数量 200 亿内的模型中是最高的分数。 在 BigCodeBench 的 complete 和 instruct 任务中CodeGeeX4-ALL-9B 分别取得了 `48.9``40.4` 的高分,这在参数量 200 亿内的模型中是最高的分数。
![BigCodeBench Test Results](./metric/pics/Bigcodebench.png) ![BigCodeBench Test Results](./metric/pics/Bigcodebench.PNG)
Crux-Eval 是测试代码推理、理解和执行能力的基准测试,借助于其强大的 COT 能力CodeGeeX4-ALL-9B 展现出色的表现。在 HumanEval、MBPP 和 NaturalCodeBench 等代码生成任务中CodeGeeX4-ALL-9B 也取得了出色的成绩。目前,它是唯一支持 Function Call 功能的代码模型,甚至取得了比 GPT-4 更高的分数。 Crux-Eval 是测试代码推理、理解和执行能力的基准测试,借助于其强大的 COT 能力CodeGeeX4-ALL-9B 展现出色的表现。在 HumanEval、MBPP 和 NaturalCodeBench 等代码生成任务中CodeGeeX4-ALL-9B 也取得了出色的成绩。目前,它是唯一支持 Function Call 功能的代码模型,甚至取得了比 GPT-4 更高的分数。
![Function Call Evaluation](./metric/pics/FunctionCall.png) ![Function Call Evaluation](./metric/pics/FunctionCall.png)
此外在“Code Needle In A Haystack” (NIAH) 评估中CodeGeeX4-ALL-9B 模型展示了在 128K 范围内检索代码的能力在python语言环境达到了 100% 的检索准确率,并在跨文件补全任务中表现出色。 此外在“Code Needle In A Haystack” (NIAH) 评估中CodeGeeX4-ALL-9B 模型展示了在 128K 范围内检索代码的能力在python语言环境达到了 100% 的检索准确率,并在跨文件补全任务中表现出色。
@ -77,7 +77,7 @@ Crux-Eval 是测试代码推理、理解和执行能力的基准测试,借助
如果您觉得我们的工作对您有帮助,欢迎引用以下论文: 如果您觉得我们的工作对您有帮助,欢迎引用以下论文:
``` ```bibtex
@inproceedings{zheng2023codegeex, @inproceedings{zheng2023codegeex,
title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X}, title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X},
author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang}, author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang},