# pytorch_transformer_translate **Repository Path**: HesenjanJava/pytorch_transformer_translate ## Basic Information - **Project Name**: pytorch_transformer_translate - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-09 - **Last Updated**: 2024-07-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
This project's reference to Attention Is All You Need and Pytorch-based release Transoformer's machine translator.About explain the detail,please click here to view.
Project Environment
``` Device: Server NVIDIA:GA102[GeForce RTX 3090] Anaconda Environment:Pyhon 3.6.13 Pytorch 1.9.0+cuda1.1.1 Tokenizers 0.12.1 Transformers 4.18.0 ```1.Project structure
```` en_it configs(include some dataset's information,in fact we don't use it) dataset(include train's dataset) data-00000-of-00001.arrow(from HuggingFace download dataset) dataset_info.json(include the download dataset some information) state.json tokenizer_en.json tokenizer_it.json tokenizer_zh.json opus_books_weights(save the train's weights file) runs(save the train's log) config.py(train's configuration information) dataset.py(read the train and test dataset) model.py(Transformer's all structure) predict.py(translate the single sentence) train.py en_zh dataset zh_en_dataset myProcess zh_en01(the first dataset, from English to Chinese) zh_en.json(from .txt file transform to JSON file save the dataset) zh_en.txt zh_en_process.py(process the source dataset) zh_en02(the second dataset, from English to Chinese) - with zh_en01 equal zh_en(in fact,we don't use it) tokenizer_en.json(generate the English tokenizer) tokenizer_zh.json(成generate the Chinese tokenizer) runs(save the train's log) en_zh01 en_zh02 weights(save the train's weights file) en_zh01_weights en_zh02_weights predict_en_zh.py zh_en_config.py zh_en_dataset.py zh_en_train.py website flagged app.py(web) train_wb.py(with train.py not difference) translate.py(input a single sentence and translate it) ````Notice: above give some files,when you train youselves dataset,please chanage the path
2.Train en_it
train process is not complicate,configuration information in config.py file,here give the train 720 epoch's weights file: link:[https://pan.baidu.com/s/1g5Y38okBPb4AnE7A2RFaww ](https://pan.baidu.com/s/1g5Y38okBPb4AnE7A2RFaww ) Extract code:1n543.Train en_zh
train process is not complicate,configuration information in zh en config.py's file,here give the 146 and 14 epoch's weights file: train the zh_en01 dataset save weights file: link:[https://pan.baidu.com/s/1mj_qZ4xadH9T7WtJYQ_L-A ](https://pan.baidu.com/s/1mj_qZ4xadH9T7WtJYQ_L-A ) Extract code:wjgk train the zh_en02 dataset save weights file: link:[https://pan.baidu.com/s/1FduLVLHnnkf2vXMf39lDgQ](https://pan.baidu.com/s/1FduLVLHnnkf2vXMf39lDgQ) Extract code:evhe4.Inference
train app.py(please change youselves file path) run and then output a url: [http://127.0.0.1:7860](http://127.0.0.1:7860) (1)from English to Italian's translate: (2)from English to Chinese's translate:该项目是参考了Attention Is All You Need 和 采用Pytorch深度学习框架实现Transformer的机器翻译..
1.项目结构如下
注意:以上的给出的文件目录,在自己训练数据集的过程中,有些路径自己修改一下即可训练自己的模型
2.训练en_it目录下英文到意大利语言的翻译
训练并不复杂,参数也不多,都在**config.py**文件中,因此这里给出训练了**720个epoch**的权重文件: 链接:[https://pan.baidu.com/s/1g5Y38okBPb4AnE7A2RFaww ](https://pan.baidu.com/s/1g5Y38okBPb4AnE7A2RFaww ) 提取码:1n543.训练en_zh目录下英文到意大利语言的翻译
训练并不复杂,参数也不多,都在zh en config.py文件中,因此这里给出训练了720个epoch的权重文件: 训练zh_en01数据集下得到的权重文件: 链接:[https://pan.baidu.com/s/1mj_qZ4xadH9T7WtJYQ_L-A ](https://pan.baidu.com/s/1mj_qZ4xadH9T7WtJYQ_L-A ) 提取码:wjgk 训练zh_en02数据集下得到的权重文件: 链接:[https://pan.baidu.com/s/1FduLVLHnnkf2vXMf39lDgQ](https://pan.baidu.com/s/1FduLVLHnnkf2vXMf39lDgQ) 提取码:evhe4.测试
运行app.py(注意里面是否需要修改权重文件路径) 运行之后得到一个链接,点击它: [http://127.0.0.1:7860](http://127.0.0.1:7860) (1)测试从英语到意大利语的翻译结果: