# codegeex-fastertransformer **Repository Path**: codegeex/codegeex-fastertransformer ## Basic Information - **Project Name**: codegeex-fastertransformer - **Description**: CodeGeex 模型的 fastertransformer - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-02-10 - **Last Updated**: 2025-06-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # CodeGeeX FasterTransformer This repository provides the fastertrasformer implementation of [CodeGeeX](https://github.com/THUDM/CodeGeeX) model. ## Get Started First, download and setup the following docker environment, replace `````` by the directory of this repo: ``` docker pull nvcr.io/nvidia/pytorch:21.11-py3 docker run -p 9114:5000 --cpus 12 --gpus '"device=0"' -it -v :/workspace/codegeex-fastertransformer --ipc=host --name=test nvcr.io/nvidia/pytorch:21.11-py3 ``` Second, install following packages in the docker: ``` pip3 install transformers pip3 install sentencepiece cd codegeex-fastertransformer sh make_all.sh # Remember to specify the DSM version according to the GPU. ``` Then, convert the initial checkpoint (download [here](https://models.aminer.cn/codegeex/download/request)) to FT version using ```get_ckpt_ft.py```. Finally, run ```api.py``` to start the server and run ```post.py``` to send request: ``` nohup python3 api.py > test.log 2>&1 & python3 post.py ``` ## Inference performance The following figure compares the performances of pure Pytorch, Megatron and FasterTransformer under INT8 and FP16. The fastest implementation is INT8 + FastTrans, and the average time of generating a token <15ms.
## Liscense Our code is licensed under the [Apache-2.0 license](LICENSE).