diff --git a/README.md b/README.md index a89c41d77633c2ecc7281237b28c94a4539745b6..737a000381e7b9959a507f75df87058ff1f6c342 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,16 @@
-中文 | [English](./README_en.md) | [日本語](./README_ja.md) +中文 | [English](./README_en.md) [](https://github.com/PaddlePaddle/PaddleOCR) [](https://pypi.org/project/PaddleOCR/) - +  + + [](https://www.paddleocr.ai/) [](https://aistudio.baidu.com/community/app/91660/webUI) @@ -21,12 +23,12 @@ ## 🚀 简介 -PaddleOCR自发布以来凭借学术前沿算法和产业落地实践,受到了产学研各方的喜爱,并被广泛应用于众多知名开源项目,例如:Umi-OCR、OmniParser、MinerU、RAGFlow等,已成为广大开发者心中的开源OCR领域的首选工具。2025年5月20日,飞桨团队发布**PaddleOCR 3.0**,全面适配**飞桨框架3.0正式版**,进一步**提升文字识别精度**,支持**多文字类型识别**和**手写体识别**,满足大模型应用对**复杂文档高精度解析**的旺盛需求,结合**文心大模型4.5 Turbo**显著提升关键信息抽取精度,并新增**对昆仑芯、昇腾等国产硬件**的支持。 +PaddleOCR自发布以来凭借学术前沿算法和产业落地实践,受到了产学研各方的喜爱,并被广泛应用于众多知名开源项目,例如:Umi-OCR、OmniParser、MinerU、RAGFlow等,已成为广大开发者心中的开源OCR领域的首选工具。2025年5月20日,飞桨团队发布**PaddleOCR 3.0**,全面适配**飞桨框架3.0正式版**,进一步**提升文字识别精度**,支持**多文字类型识别**和**手写体识别**,满足大模型应用对**复杂文档高精度解析**的旺盛需求,结合**文心大模型4.5 Turbo**显著提升关键信息抽取精度,并新增**对昆仑芯、昇腾等国产硬件**的支持。完整使用文档请参考 [PaddleOCR 3.0 文档](https://paddlepaddle.github.io/PaddleOCR/latest/)。 -PaddleOCR 3.0**新增**三大特色能力:: -- 全场景文字识别模型[PP-OCRv5](docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.md):单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代**提升13个百分点**。 -- 通用文档解析方案[PP-StructureV3](docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md):支持多场景、多版式 PDF 高精度解析,在公开评测集中**领先众多开源和闭源方案**。 -- 智能文档理解方案[PP-ChatOCRv4](docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.md):原生支持文心大模型4.5 Turbo,精度相比上一代**提升15个百分点**。 +PaddleOCR 3.0**新增**三大特色能力: +- 全场景文字识别模型[PP-OCRv5](docs/version3.x/algorithm/PP-OCRv5/PP-OCRv5.md):单模型支持五种文字类型和复杂手写体识别;整体识别精度相比上一代**提升13个百分点**。[在线体验](https://aistudio.baidu.com/community/app/91660/webUI) +- 通用文档解析方案[PP-StructureV3](docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md):支持多场景、多版式 PDF 高精度解析,在公开评测集中**领先众多开源和闭源方案**。[在线体验](https://aistudio.baidu.com/community/app/518494/webUI) +- 智能文档理解方案[PP-ChatOCRv4](docs/version3.x/algorithm/PP-ChatOCRv4/PP-ChatOCRv4.md):原生支持文心大模型4.5 Turbo,精度相比上一代**提升15个百分点**。[在线体验](https://aistudio.baidu.com/community/app/518493/webUI) PaddleOCR 3.0除了提供优秀的模型库外,还提供好学易用的工具,覆盖模型训练、推理和服务化部署,方便开发者快速落地AI应用。
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
+
-
+
-
+
--device
cpu
(if GPU is unavailable) or gpu
(if GPU is available).--host
--device
cpu
(如 GPU 不可用)或 gpu
(如 GPU 可用)。--host
Parameter | Description | -Parameter Type | -Options | -Default Value | +Type | +Default | |
---|---|---|---|---|---|---|---|
model_name |
Model name | str |
-None | -None |
+PP-LCNet_x1_0_doc_ori |
||
model_dir |
Model storage path | str |
-None | -None | +None |
||
device |
-Model inference device | +Device(s) to use for inference. +Examples: cpu , gpu , npu , gpu:0 , gpu:0,1 .+If multiple devices are specified, inference will be performed in parallel. Note that parallel inference is not always supported. +By default, GPU 0 will be used if available; otherwise, the CPU will be used. + |
str |
-Supports specifying the specific card number of GPU, such as "gpu:0", specific card numbers of other hardware, such as "npu:0", and CPU, such as "cpu". | -gpu:0 |
+None |
|
use_hpip |
-Whether to enable the high-performance inference plugin | +enable_hpi |
+Whether to use the high performance inference. | bool |
-None | False |
|
hpi_config |
-High-performance inference configuration | -dict | None |
-None | +use_tensorrt |
+Whether to use the Paddle Inference TensorRT subgraph engine. | +bool |
+False |
+
min_subgraph_size |
+Minimum subgraph size for TensorRT when using the Paddle Inference TensorRT subgraph engine. | +int |
+3 |
+||||
precision |
+Precision for TensorRT when using the Paddle Inference TensorRT subgraph engine. Options: fp32 , fp16 , etc. |
+str |
+fp32 |
+||||
enable_mkldnn |
++Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. + | +bool |
+True |
+||||
cpu_threads |
+Number of threads to use for inference on CPUs. | +int |
+10 |
+||||
top_k |
+The top-k value for prediction results. If not specified, the default value in the official PaddleOCR model configuration is used. If the value is 5, the top 5 categories and their corresponding classification probabilities will be returned. | +int |
None |
Parameter | Description | -Parameter Type | -Options | -Default Value | +Type | +Default | |
---|---|---|---|---|---|---|---|
input |
-Data to be predicted, supporting multiple input types | -Python Var /str /list |
-+ | Input data to be predicted. Required. Supports multiple input types:
|
-None | +Python Var|str|list |
+|
batch_size |
-Batch size | +Batch size, positive integer. | int |
-Any integer | 1 | ||
top_k |
+The top-k value for prediction results. If not specified, the value provided when the model was instantiated will be used; if it was not specified at instantiation either, the default value in the official PaddleOCR model configuration is used. | +int |
+None |
+
model_name
str
无
PP-LCNet_x1_0_doc_ori
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
top_k
int
None
input
Python Var
/str
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
top_k
int
None
model_name
str
None
PP-DocBee-2B
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
dict
Dict
, as multimodal models have different input requirements, it needs to be determined based on the specific model. Specifically:
-{'image': image_path, 'query': query_text}
{'image': image_path, 'query': query_text}
dict
batch_size
int
model_name
str
无
PP-DocBee-2B
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
dict
Dict
, 由于多模态模型对输入有不同的要求,需要根据具体的模型确定,具体而言:
-{'image': image_path, 'query': query_text}
{'image': image_path, 'query': query_text}
dict
batch_size
int
model_name
str
PP-FormulaNet_plus-M
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
Python Var
/str
/list
numpy.ndarray
/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
numpy.ndarray
representing image data/root/data/img.jpg
;
+ - URL of image or PDF file: e.g., example;
+ - Local directory: directory containing images for prediction, e.g., /root/data/
(Note: directories containing PDF files are not supported; PDFs must be specified by exact file path)[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | -||||
---|---|---|---|---|---|---|---|---|
model_name |
模型名称 | str |
-所有支持的模型名称 | -无 | +PP-FormulaNet_plus-M |
|||
model_dir |
模型存储路径 | str |
-无 | -无 | +None |
|||
device |
-模型推理设备 | +用于推理的设备。 +例如: cpu 、gpu 、npu 、gpu:0 、gpu:0,1 。+如指定多个设备,将进行并行推理。 +默认情况下,优先使用 GPU 0;若不可用则使用 CPU。 + |
str |
-支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。 | -gpu:0 |
+None |
||
use_hpip |
-是否启用高性能推理插件 | +enable_hpi |
+是否启用高性能推理。 | bool |
-无 | False |
||
hpi_config |
-高性能推理配置 | -dict | None |
-无 | -None |
+use_tensorrt |
+是否启用 Paddle Inference 的 TensorRT 子图引擎。 | +bool |
+False |
min_subgraph_size |
+当使用 Paddle Inference 的 TensorRT 子图引擎时,设置的最小子图大小。 | +int |
+3 |
+|||||
precision |
+当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。 可选项: fp32 、fp16 等。 |
+str |
+fp32 |
+|||||
enable_mkldnn |
+
+是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。 + |
+bool |
+True |
+|||||
cpu_threads |
+在 CPU 上推理时使用的线程数量。 | +int |
+10 |
+
input
Python Var
/str
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
model_name
str
PP-DocLayout-L
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
img_size
int/list/None
enable_hpi
bool
False
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
img_size
int/list/None
threshold
float/dict/None
cls_id
and values as float thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
means applying a threshold of 0.45 for cls_id 0, 0.48 for cls_id 2, and 0.4 for cls_id 7int
cls_id, value isfloat
threshold, e.g. {0: 0.45, 2: 0.48, 7: 0.4}
float/dict/None
layout_nms
bool/None
bool/None
layout_unclip_ratio
float/list/dict/None
cls_id
, values as float scaling factors, e.g., {0: (1.1, 2.0)}
means cls_id 0 expanding the width by 1.1 times and the height by 2.0 times while keeping the center unchangedint
cls_id, value istuple
, e.g. {0: (1.1, 2.0)}
float/list/dict/None
layout_merge_bboxes_mode
string/dict/None
cls_id
and values as merging modes, e.g., {0: "large", 2: "small"}
int
cls_id, value isstr
, e.g. {0: "large", 2: "small"}
string/dict/None
use_hpip
bool
False
hpi_config
dict
| None
None
Parameter | Description | Type | -Options | -Default Value | +Default | ||
---|---|---|---|---|---|---|---|
input |
-Data for prediction, supporting multiple input types | -Python Var /str /list |
-+ | Input data to be predicted. Required. Supports multiple input types:
|
-None | +Python Var|str|list |
+|
batch_size |
-Batch size | +Batch size, positive integer. | int |
-Any integer greater than 0 | 1 | ||
threshold |
-Threshold for filtering low-confidence prediction results | -float/dict/None |
-+ | Threshold for filtering low-confidence predictions. If not specified, the model's default will be used. Examples:
|
+float/dict/None |
+None | |
layout_nms |
-Whether to use NMS post-processing to filter overlapping boxes; if not specified, the default False will be used | -bool/None |
-+ | Whether to use NMS post-processing to filter overlapping boxes Examples:
|
+bool/None |
None | |
layout_unclip_ratio |
-Scaling factor for the side length of the detection box; if not specified, the default 1.0 will be used | -float/list/dict/None |
-+ | Scaling ratio for the detected box size. If not specified, defaults to 1.0 Examples:
|
+float/list/dict/None |
+None | +|
layout_merge_bboxes_mode |
-Merging mode for the detection boxes output by the model; if not specified, the default union will be used | -string/dict/None |
-+ | Merge mode for detected bounding boxes. Defaults to union if not specifiedExamples:
|
+string/dict/None |
None |
* If None
is passed to predict()
, the value set during model instantiation (__init__
) will be used; if it was also None
there, the framework defaults are applied:
+ threshold=0.5
, layout_nms=False
, layout_unclip_ratio=1.0
, layout_merge_bboxes_mode="union"
.
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | ||
---|---|---|---|---|---|---|
model_name |
模型名称 | str |
-无 | -无 | +PP-DocLayout-L |
|
model_dir |
模型存储路径 | str |
-无 | -无 | +None |
|
device |
-模型推理设备 | +用于推理的设备。 +例如: cpu 、gpu 、npu 、gpu:0 、gpu:0,1 。+如指定多个设备,将进行并行推理。 +默认情况下,优先使用 GPU 0;若不可用则使用 CPU。 + |
str |
-支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。 | -gpu:0 |
+None |
img_size |
-输入图像大小;如果不指定,PP-DocLayout_plus-L模型将默认使用800x800 | -int/list/None |
+enable_hpi |
+是否启用高性能推理。 | +bool |
+False |
+
use_tensorrt |
+是否启用 Paddle Inference 的 TensorRT 子图引擎。 | +bool |
+False |
+|||
min_subgraph_size |
+当使用 Paddle Inference 的 TensorRT 子图引擎时,设置的最小子图大小。 | +int |
+3 |
+|||
precision |
+当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。 可选项: fp32 、fp16 等。 |
+str |
+fp32 |
+|||
enable_mkldnn |
+是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。 + |
+bool |
+True |
+|||
cpu_threads |
+在 CPU 上推理时使用的线程数量。 | +int |
+10 |
+|||
img_size |
+输入图像大小;如果不指定,PP-DocLayout_plus-L模型将默认使用800x800 可选示例:
|
+int/list/None |
None | |||
threshold |
-用于过滤掉低置信度预测结果的阈值;如果不指定,将默认使用0.5 | -float/dict/None |
-+ | 用于过滤掉低置信度预测结果的阈值;如果不指定,将默认使用PaddleOCR官方模型配置 可选示例:
|
+float/dict/None |
None |
layout_nms |
-是否使用NMS后处理,过滤重叠框;如果不指定,将默认不使用NMS | -bool/None |
-+ | 是否使用NMS后处理,过滤重叠框;如果不指定,将默认使用PaddleOCR官方模型配置 可选示例:
|
+bool/None |
None |
layout_unclip_ratio |
-检测框的边长缩放倍数;如果不指定,将默认使用1.0 | -float/list/dict/None |
-+ | 检测框的边长缩放倍数;如果不指定,将默认使用PaddleOCR官方模型配置 可选示例:
|
+float/list/dict/None |
+None | +
layout_merge_bboxes_mode |
-模型输出的检测框的合并处理模式;如果不指定,将默认使用union模式 | -string/dict/None |
-+ | 模型输出的检测框的合并处理模式;如果不指定,将默认使用PaddleOCR官方模型配置 可选示例:
|
+string/dict/None |
None |
use_hpip |
-是否启用高性能推理插件 | -bool |
-无 | -False |
-||
hpi_config |
-高性能推理配置 | -dict | None |
-无 | -None |
-
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | |||
---|---|---|---|---|---|---|---|
input |
-待预测数据,支持多种输入类型 | -Python Var /str /list |
-+ | 待预测数据,支持多种输入类型,必填。
|
-无 | +Python Var|str|list |
+|
batch_size |
-批大小 | +批大小,可设置为任意正整数。 | int |
-大于0的任意整数 | 1 | ||
threshold |
-用于过滤掉低置信度预测结果的阈值 | -float/dict/None |
-+ | 用于过滤掉低置信度预测结果的阈值; 可选示例:
|
+float/dict/None |
+None | +|
layout_nms |
-是否使用NMS后处理,过滤重叠框;如果不指定,将默认不使用NMS | -bool/None |
-+ | 是否使用NMS后处理,过滤重叠框; 可选示例:
|
+bool/None |
None | |
layout_unclip_ratio |
-检测框的边长缩放倍数;如果不指定,将默认使用1.0 | -float/list/dict/None |
-+ | 检测框的边长缩放倍数。 可选示例:
|
+float/list/dict/None |
+None | +|
layout_merge_bboxes_mode |
-模型输出的检测框的合并处理模式;如果不指定,将默认使用union模式 | -string/dict/None |
-+ | 模型输出的检测框的合并处理模式; 可选示例:
|
-无 | +string/dict/None |
+None |
① 当调用 predict()
时该参数为 None
时,将继承模型实例化 (__init__
) 时对应参数的值;若实例化时也未显式指定,则使用框架默认:
+ threshold=0.5
,layout_nms=False
,layout_unclip_ratio=1.0
,layout_merge_bboxes_mode="union"
。
model_name
PP-OCRv4_mobile_seal_det
.str
PP-OCRv4_mobile_seal_det
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
limit_side_len
int/None
enable_hpi
bool
False
limit_type
str/None
use_tensorrt
bool
False
thresh
float/None
min_subgraph_size
int
3
box_thresh
float/None
precision
fp32
, fp16
, etc.str
fp32
max_candidates
int/None
enable_mkldnn
bool
True
unclip_ratio
float/None
cpu_threads
int
10
use_dilation
bool/None
limit_side_len
int
specifies the value. If set to None
, the default value from the official PaddleOCR model configuration will be used.int
/ None
None
use_hpip
bool
False
limit_type
"min"
ensures the shortest side of the image is no less than det_limit_side_len
; "max"
ensures the longest side is no greater than limit_side_len
. If set to None
, the default value from the official PaddleOCR model configuration will be used.str
/ None
None
hpi_config
dict
| None
thresh
None
, the default value from the official PaddleOCR model configuration will be used.float
/ None
None
box_thresh
None
, the default value from the official PaddleOCR model configuration will be used.float
/ None
None
unclip_ratio
None
, the default value from the official PaddleOCR model configuration will be used.float
/ None
None
input_shape
(C, H, W)
. If set to None
, the model's default size will be used.tuple
/ None
None
Parameter | -Parameter Description | -Parameter Type | -Options | -Default Value | +Description | +Type | +Default |
---|---|---|---|---|---|---|---|
input |
-Data to be predicted, supporting multiple input types | -Python Var /str /dict /list |
-+ | Input data to be predicted. Required. Supports multiple input types:
|
-None | +Python Var|str|list |
+|
batch_size |
-Batch size | +Batch size, positive integer. | int |
-Any integer greater than 0 | 1 | ||
limit_side_len |
-Side length limit for detection | -int/None |
-
-
|
-
-None | +Limit on the side length of the input image for detection. int specifies the value. If set to None , the parameter value initialized by the model will be used by default. |
+int / None |
+None |
limit_type |
-Type of side length limit for detection | -str/None |
-
-
|
-
-
-None | +Type of image side length limitation. "min" ensures the shortest side of the image is no less than det_limit_side_len ; "max" ensures the longest side is no greater than limit_side_len . If set to None , the parameter value initialized by the model will be used by default. |
+str / None |
+None |
thresh |
-In the output probability map, pixels with scores greater than this threshold will be considered as text pixels | -float/None |
-
-
|
-
-None | +Pixel score threshold. Pixels in the output probability map with scores greater than this threshold are considered text pixels. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
+float / None |
+None |
box_thresh |
-If the average score of all pixels within the detection result box is greater than this threshold, the result will be considered as a text area | -float/None |
-
-
|
-
-None | -|||
max_candidates |
-Maximum number of text boxes to be output | -int/None |
-
-
|
-
-None | +If the average score of all pixels inside the bounding box is greater than this threshold, the result is considered a text region. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
+float / None |
+None |
unclip_ratio |
-Expansion coefficient of the Vatti clipping algorithm, used to expand the text area | -float/None |
-
-
|
-
-None | -|||
use_dilation |
-Whether to dilate the segmentation result | -bool/None |
-True/False/None | -None | +Expansion ratio for the Vatti clipping algorithm, used to expand the text region. Accepts any float value greater than 0. If set to None , the parameter value initialized by the model will be used by default. |
+float / None |
+None |
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | +||||
---|---|---|---|---|---|---|---|---|
model_name |
-模型名称 | +模型名称。所有支持的印章文本检测模型名称,如 PP-OCRv4_mobile_seal_det 。 |
str |
-所有支持的印章文本检测模型名称 | -无 | +"PP-OCRv4_mobile_seal_det" |
||
model_dir |
模型存储路径 | str |
-无 | -无 | +None |
|||
device |
-模型推理设备 | +用于推理的设备。 +例如: cpu 、gpu 、npu 、gpu:0 、gpu:0,1 。+如指定多个设备,将进行并行推理。 +默认情况下,优先使用 GPU 0;若不可用则使用 CPU。 + |
str |
-支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。 | -gpu:0 |
+None |
||
limit_side_len |
-检测的图像边长限制 | -int/None |
-
-
|
-
-None | +enable_hpi |
+是否启用高性能推理。 | +bool |
+False |
limit_type |
-检测的图像边长限制,检测的边长限制类型 | -str/None |
-
-
|
-
-
-None | +use_tensorrt |
+是否启用 Paddle Inference 的 TensorRT 子图引擎。 | +bool |
+False |
thresh |
-输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点 | -float/None |
-
-
|
-
-None | +min_subgraph_size |
+当使用 Paddle Inference 的 TensorRT 子图引擎时,设置的最小子图大小。 | +int |
+3 |
box_thresh |
-检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域 | -float/None |
-
-
|
-None | +precision |
+当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。 可选项: fp32 、fp16 等。 |
+str |
+fp32 |
unclip_ratio |
-Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张 | -float/None |
+enable_mkldnn |
-
|
-None | +是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。bool |
+True |
|
use_dilation |
-是否对分割结果进行膨胀 | -bool/None |
-True/False/None | -None | +cpu_threads |
+在 CPU 上推理时使用的线程数量。 | +int |
+10 |
use_hpip |
-是否启用高性能推理插件 | -bool |
-无 | -False |
+limit_side_len |
+检测的图像边长限制:int 表示边长限制数值,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+int / None |
+None |
hpi_config |
-高性能推理配置 | -dict | None |
-无 | +limit_type |
+检测的图像边长限制,检测的边长限制类型,"min" 表示保证图像最短边不小于det_limit_side_len,"max" 表示保证图像最长边不大于limit_side_len。如果设置为None, 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+str / None |
+None |
+|
thresh |
+像素得分阈值。输出概率图中得分大于该阈值的像素点被认为是文本像素。可选大于0的float任意浮点数,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
+|||||
box_thresh |
+检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域。可选大于0的float任意浮点数,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
+|||||
unclip_ratio |
+Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张。可选大于0的任意浮点数。如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
+|||||
input_shape |
+模型输入图像尺寸,格式为 (C, H, W) 。若为 None 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+tuple / None |
None |
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | |||
---|---|---|---|---|---|---|---|
input |
-待预测数据,支持多种输入类型 | -Python Var /str /dict /list |
-+ | 待预测数据,支持多种输入类型,必填。
|
-无 | +Python Var|str|list |
+|
batch_size |
-批大小 | +批大小,可设置为任意正整数。 | int |
-大于0的任意整数 | 1 | ||
limit_side_len |
-检测的图像边长限制 | -int/None |
-
-
|
-
-None | +检测的图像边长限制:int 表示边长限制数值,如果设置为None , 如果设置为None, 将默认使用模型初始化的该参数值。 |
+int / None |
+None |
limit_type |
-检测的图像边长限制,检测的边长限制类型 | -str/None |
-
-
|
-
-
-None | +检测的图像边长限制,检测的边长限制类型,"min" 表示保证图像最短边不小于det_limit_side_len,"max" 表示保证图像最长边不大于limit_side_len。如果设置为None, 将默认使用模型初始化的该参数值。 |
+str / None |
+None |
thresh |
-输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点 | -float/None |
-
-
|
-
-None | +像素得分阈值。输出概率图中得分大于该阈值的像素点被认为是文本像素。可选大于0的float任意浮点数,如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
box_thresh |
-检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域 | -float/None |
-
-
|
-
-None | -|||
max_candidates |
-输出的最大文本框数量 | -int/None |
-
-
|
-
-None | +检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域。可选大于0的float任意浮点数,如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
unclip_ratio |
-Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张 | -float/None |
-
-
|
-
-None | -|||
use_dilation |
-是否对分割结果进行膨胀 | -bool/None |
-True/False/None | -None | +Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张。可选大于0的任意浮点数。如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
model_name
str
PP-DocLayout-L
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
img_size
int/list
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
img_size
int/list/None
threshold
float/dict
cls_id
, and the value is of type float representing the threshold. For example, {0: 0.45, 2: 0.48, 7: 0.4}
applies a threshold of 0.45 for category cls_id 0, 0.48 for category cls_id 1, and 0.4 for category cls_id 7cls_id
, and the value is of type float representing the threshold. For example, {0: 0.3}
applies a threshold of 0.3 for category cls_id 0float/dict/None
input
Python Var
/str
/list
numpy.ndarray
representing image data/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
numpy.ndarray
representing image data/root/data/img.jpg
;
+ - URL of image or PDF file: e.g., example;
+ - Local directory: directory containing images for prediction, e.g., /root/data/
(Note: directories containing PDF files are not supported; PDFs must be specified by exact file path)[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
threshold
threshold
parameter specified in create_model
will be used by default, and if create_model
is not specified, the PaddleX official model configuration will be usedfloat/dict
cls_id
, and the value is of type float representing the threshold. For example, {0: 0.45, 2: 0.48, 7: 0.4}
applies a threshold of 0.45 for category cls_id 0, 0.48 for category cls_id 1, and 0.4 for category cls_id 7int
representing cls_id
, and values are float
thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
applies thresholds of 0.45 to class 0, 0.48 to class 2, and 0.4 to class 7float/dict/None
model_name
str
PP-DocLayout-L
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
img_size
int/list
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
img_size
int/list/None
threshold
float/dict
cls_id
,val为float类型阈值。如 {0: 0.45, 2: 0.48, 7: 0.4}
,表示对cls_id为0的类别应用阈值0.45、cls_id为1的类别应用阈值0.48、cls_id为7的类别应用阈值0.4float/dict/None
input
Python Var
/str
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
threshold
float/dict
cls_id
,val为float类型阈值。如 {0: 0.45, 2: 0.48, 7: 0.4}
,表示对cls_id为0的类别应用阈值0.45、cls_id为1的类别应用阈值0.48、cls_id为7的类别应用阈值0.4float/dict/None
model_name
str
None
PP-LCNet_x1_0_doc_ori
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
Python Var
/str
/list
numpy.ndarray
representing image data/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
numpy.ndarray
representing image data/root/data/img.jpg
;
+ - URL of image or PDF file: e.g., example;
+ - Local directory: directory containing images for prediction, e.g., /root/data/
(Note: directories containing PDF files are not supported; PDFs must be specified by exact file path)[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
model_name
str
无
None
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
Python Var
/str
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
model_name
str
None
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
Parameter | Description | Type | -Options | Default | |||
---|---|---|---|---|---|---|---|
input |
-Data to be predicted, supports multiple input types | -Python Var /str /list |
-+ | Input data to be predicted. Required. Supports multiple input types:
|
-None | +Python Var|str|list |
+|
batch_size |
-Batch size | +Batch size, positive integer. | int |
-Any integer | 1 |
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | +||||
---|---|---|---|---|---|---|---|---|
model_name |
模型名称 | str |
-所有支持的模型名称 | -无 | +None |
|||
model_dir |
模型存储路径 | str |
-无 | -无 | +None |
|||
device |
-模型推理设备 | +用于推理的设备。 +例如: cpu 、gpu 、npu 、gpu:0 、gpu:0,1 。+如指定多个设备,将进行并行推理。 +默认情况下,优先使用 GPU 0;若不可用则使用 CPU。 + |
str |
-支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。 | -gpu:0 |
+None |
||
use_hpip |
-是否启用高性能推理插件 | +enable_hpi |
+是否启用高性能推理。 | bool |
-无 | False |
||
hpi_config |
-高性能推理配置 | -dict | None |
-无 | -None |
+use_tensorrt |
+是否启用 Paddle Inference 的 TensorRT 子图引擎。 | +bool |
+False |
+
min_subgraph_size |
+当使用 Paddle Inference 的 TensorRT 子图引擎时,设置的最小子图大小。 | +int |
+3 |
|||||
precision |
+当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。 可选项: fp32 、fp16 等。 |
+str |
+fp32 |
+|||||
enable_mkldnn |
+
+是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。 + |
+bool |
+True |
+|||||
cpu_threads |
+在 CPU 上推理时使用的线程数量。 | +int |
+10 |
+
input
Python Var
/str
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
Parameter | Description | Type | -Options | Default | |||
---|---|---|---|---|---|---|---|
model_name |
-Model name | +Model name. All supported seal text detection model names, such as PP-OCRv5_mobile_det . |
str |
-All PaddleX-supported text detection model names | -Required | +None |
|
model_dir |
Model storage path | str |
-N/A | -N/A | +None |
||
device |
-Inference device | +Device(s) to use for inference. +Examples: cpu , gpu , npu , gpu:0 , gpu:0,1 .+If multiple devices are specified, inference will be performed in parallel. Note that parallel inference is not always supported. +By default, GPU 0 will be used if available; otherwise, the CPU will be used. + |
str |
-GPU (e.g., "gpu:0"), NPU (e.g., "npu:0"), CPU ("cpu") | -gpu:0 |
+None |
+|
enable_hpi |
+Whether to use the high performance inference. | +bool |
+False |
+||||
use_tensorrt |
+Whether to use the Paddle Inference TensorRT subgraph engine. | +bool |
+False |
+||||
min_subgraph_size |
+Minimum subgraph size for TensorRT when using the Paddle Inference TensorRT subgraph engine. | +int |
+3 |
+||||
precision |
+Precision for TensorRT when using the Paddle Inference TensorRT subgraph engine. Options: fp32 , fp16 , etc. |
+str |
+fp32 |
+||||
enable_mkldnn |
++Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. + | +bool |
+True |
+||||
cpu_threads |
+Number of threads to use for inference on CPUs. | +int |
+10 |
||||
limit_side_len |
-Image side length limit for detection | -int/None |
-Positive integer or None (uses default model config) | -None | +Limit on the side length of the input image for detection. int specifies the value. If set to None , the default value from the official PaddleOCR model configuration will be used. |
+int / None |
+None |
limit_type |
-Side length restriction type | -str/None |
-"min" (shortest side ≥ limit) or "max" (longest side ≤ limit) | -None | +Type of image side length limitation. "min" ensures the shortest side of the image is no less than det_limit_side_len ; "max" ensures the longest side is no greater than limit_side_len . If set to None , the default value from the official PaddleOCR model configuration will be used. |
+str / None |
+None |
thresh |
-Pixel score threshold for text detection | -float/None |
-Positive float or None (uses default model config) | -None | +Pixel score threshold. Pixels in the output probability map with scores greater than this threshold are considered text pixels. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
+float / None |
+None |
box_thresh |
-Average score threshold for text regions | -float/None |
-Positive float or None (uses default model config) | -None | +If the average score of all pixels inside the bounding box is greater than this threshold, the result is considered a text region. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
+float / None |
+None |
unclip_ratio |
-Expansion coefficient for Vatti clipping algorithm | -float/None |
-Positive float or None (uses default model config) | -None | -|||
use_hpip |
-Enable high-performance inference plugin | -bool |
-N/A | -False |
+Expansion ratio for the Vatti clipping algorithm, used to expand the text region. Accepts any float value greater than 0. If set to None , the default value from the official PaddleOCR model configuration will be used. |
+float / None |
+None |
hpi_config |
-High-performance inference configuration | -dict | None |
-N/A | +input_shape |
+Input image size for the model in the format (C, H, W) . If set to None , the model's default size will be used. |
+tuple / None |
None |
input
Python Var
/str
/dict
/list
numpy.ndarray
representing image data/root/data/img.jpg
[numpy.ndarray, "/root/data/img.jpg"]
Python Var
/ str
/ dict
/ list
batch_size
int
limit_side_len
int/None
int
specifies the value. If set to None
, the parameter value initialized by the model will be used by default.int
/ None
None
limit_type
str/None
"min"
ensures the shortest side of the image is no less than det_limit_side_len
; "max"
ensures the longest side is no greater than limit_side_len
. If set to None
, the parameter value initialized by the model will be used by default.str
/ None
None
thresh
float/None
None
, the parameter value initialized by the model will be used by default.float
/ None
None
box_thresh
float/None
None
, the parameter value initialized by the model will be used by default.float
/ None
None
unclip_ratio
float/None
None
, the parameter value initialized by the model will be used by default.float
/ None
None
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | |||
---|---|---|---|---|---|---|---|
model_name |
-模型名称 | +模型名称。所有支持的文本检测模型名称,如 PP-OCRv5_mobile_det 。 |
str |
-所有支持的文本检测模型名称 | -无 | +None |
|
model_dir |
模型存储路径 | str |
-无 | -无 | +None |
||
device |
-模型推理设备 | +用于推理的设备。 +例如: cpu 、gpu 、npu 、gpu:0 、gpu:0,1 。+如指定多个设备,将进行并行推理。 +默认情况下,优先使用 GPU 0;若不可用则使用 CPU。 + |
str |
-支持指定GPU具体卡号,如“gpu:0”,其他硬件具体卡号,如“npu:0”,CPU如“cpu”。 | -gpu:0 |
+None |
|
limit_side_len |
-检测的图像边长限制 | -int/None |
+enable_hpi |
+是否启用高性能推理。 | +bool |
+False |
+|
use_tensorrt |
+是否启用 Paddle Inference 的 TensorRT 子图引擎。 | +bool |
+False |
+||||
min_subgraph_size |
+当使用 Paddle Inference 的 TensorRT 子图引擎时,设置的最小子图大小。 | +int |
+3 |
+||||
precision |
+当使用 Paddle Inference 的 TensorRT 子图引擎时设置的计算精度。 可选项: fp32 、fp16 等。 |
+str |
+fp32 |
+||||
enable_mkldnn |
-
|
-
-None | +是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。bool |
+True |
+|||
cpu_threads |
+在 CPU 上推理时使用的线程数量。 | +int |
+10 |
+||||
limit_side_len |
+检测的图像边长限制:int 表示边长限制数值,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+int / None |
+None |
||||
limit_type |
-检测的图像边长限制,检测的边长限制类型 | -str/None |
-
-
|
-
-
-None | +检测的图像边长限制,检测的边长限制类型,"min" 表示保证图像最短边不小于det_limit_side_len,"max" 表示保证图像最长边不大于limit_side_len。如果设置为None, 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+str / None |
+None |
thresh |
-输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点 | -float/None |
-
-
|
-
-None | +像素得分阈值。输出概率图中得分大于该阈值的像素点被认为是文本像素。可选大于0的float任意浮点数,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
box_thresh |
-检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域 | -float/None |
-
-
|
-
-None | +检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域。可选大于0的float任意浮点数,如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
unclip_ratio |
-Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张 | -float/None |
-
-
|
-
-None | -|||
use_hpip |
-是否启用高性能推理插件 | -bool |
-无 | -False |
+Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张。可选大于0的任意浮点数。如果设置为None , 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+float / None |
+None |
hpi_config |
-高性能推理配置 | -dict | None |
-无 | +input_shape |
+模型输入图像尺寸,格式为 (C, H, W) 。若为 None 将默认使用PaddleOCR官方模型配置中的该参数值。 |
+tuple / None |
None |
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | |||
---|---|---|---|---|---|---|---|
input |
-待预测数据,支持多种输入类型 | -Python Var /str /dict /list |
-+ | 待预测数据,支持多种输入类型,必填。
|
-无 | +Python Var|str|list |
+|
batch_size |
-批大小 | +批大小,可设置为任意正整数。 | int |
-大于0的任意整数 | 1 | ||
limit_side_len |
-检测的图像边长限制 | -int/None |
-
-
|
-
-None | +检测的图像边长限制:int 表示边长限制数值,如果设置为None , 如果设置为None, 将默认使用模型初始化的该参数值。 |
+int / None |
+None |
limit_type |
-检测的图像边长限制,检测的边长限制类型 | -str/None |
-
-
|
-
-
-None | +检测的图像边长限制,检测的边长限制类型,"min" 表示保证图像最短边不小于det_limit_side_len,"max" 表示保证图像最长边不大于limit_side_len。如果设置为None, 将默认使用模型初始化的该参数值。 |
+str / None |
+None |
thresh |
-输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点 | -float/None |
-
-
|
-
-None | +像素得分阈值。输出概率图中得分大于该阈值的像素点被认为是文本像素。可选大于0的float任意浮点数,如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
box_thresh |
-检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域 | -float/None |
-
-
|
-
-None | +检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域。可选大于0的float任意浮点数,如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
unclip_ratio |
-Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张 | -float/None |
-
-
|
-
-None | +Vatti clipping算法的扩张系数,使用该方法对文字区域进行扩张。可选大于0的任意浮点数。如果设置为None , 将默认使用模型初始化的该参数值。 |
+float / None |
+None |
model_name
str
None
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
Python Var
/str
/dict
/list
numpy.ndarray
representing image data/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
numpy.ndarray
representing image data/root/data/img.jpg
;
+ - URL of image or PDF file: e.g., example;
+ - Local directory: directory containing images for prediction, e.g., /root/data/
(Note: directories containing PDF files are not supported; PDFs must be specified by exact file path)[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
model_name
str
None
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input
Python Var
/str
/dict
/list
numpy.ndarray
表示的图像数据/root/data/img.jpg
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
Model | -Download Links | -Top-1 Acc (%) | -GPU Inference Time (ms) | -CPU Inference Time (ms) | -Model Size (M) | -Description | -
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | Inference Model/Training Model | -95.54 | -- | -- | -0.32 | -A text line classification model based on PP-LCNet_x0_25, with two classes: 0° and 180°. | -
Mode | -GPU Configuration | -CPU Configuration | -Acceleration Techniques | -
---|---|---|---|
Standard Mode | -FP32 precision / No TRT acceleration | -FP32 precision / 8 threads | -PaddleInference | -
High-Performance Mode | -Optimal combination of precision and acceleration strategies | -FP32 precision / 8 threads | -Optimal backend selection (Paddle/OpenVINO/TRT, etc.) | -
Parameter | -Description | -Type | -Options | -Default | -
---|---|---|---|---|
model_name |
-Model name | -str |
-N/A | -None |
-
model_dir |
-Custom model path | -str |
-N/A | -None | -
device |
-Inference device | -str |
-E.g., "gpu:0", "npu:0", "cpu" | -gpu:0 |
-
use_hpip |
-Enable high-performance inference | -bool |
-N/A | -False |
-
hpi_config |
-HPI configuration | -dict | None |
-N/A | -None |
-
Method | -Description | -Parameters | -Type | -Details | -Default | -
---|---|---|---|---|---|
print() |
-Print results | -format_json , indent , ensure_ascii |
-bool , int , bool |
-Control JSON formatting and ASCII escaping | -True , 4, False |
-
save_to_json() |
-Save results as JSON | -save_path , indent , ensure_ascii |
-str , int , bool |
-Same as print() |
-N/A, 4, False |
-
save_to_img() |
-Save visualized results | -save_path |
-str |
-Output path | -N/A | -
model_name
str
None
model_dir
str
None
device
cpu
, gpu
, npu
, gpu:0
, gpu:0,1
.str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
, fp16
, etc.str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
input_shape
(C, H, W)
. If set to None
, the model's default size will be used.tuple
/ None
None
batch_size
int
/root/data/
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
batch_size
int
Model | Model Download Link | +Top-1 Accuracy (%) | +GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+CPU Inference Time (ms) | +Model Size (M) | +Description | +
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | Inference Model/Training Model | +98.85 | +- | +- | +0.96 | +Text line classification model based on PP-LCNet_x0_25, with two classes: 0 degrees and 180 degrees | +
PP-LCNet_x1_0_textline_ori | Inference Model/Training Model | +99.42 | +- | +- | +6.5 | +Text line classification model based on PP-LCNet_x1_0, with two classes: 0 degrees and 180 degrees | +
Mode | +GPU Configuration | +CPU Configuration | +Acceleration Technology Combination | +
---|---|---|---|
Normal Mode | +FP32 Precision / No TRT Acceleration | +FP32 Precision / 8 Threads | +PaddleInference | +
High-Performance Mode | +Optimal combination of pre-selected precision types and acceleration strategies | +FP32 Precision / 8 Threads | +Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) | +
Parameter | +Description | +Type | +Default | +
---|---|---|---|
model_name |
+Name of the model | +str |
+None |
+
model_dir |
+Model storage path | +str |
+None |
+
device |
+Device(s) to use for inference. +Examples: cpu , gpu , npu , gpu:0 , gpu:0,1 .+If multiple devices are specified, inference will be performed in parallel. Note that parallel inference is not always supported. +By default, GPU 0 will be used if available; otherwise, the CPU will be used. + |
+str |
+None |
+
enable_hpi |
+Whether to use the high performance inference. | +bool |
+False |
+
use_tensorrt |
+Whether to use the Paddle Inference TensorRT subgraph engine. | +bool |
+False |
+
min_subgraph_size |
+Minimum subgraph size for TensorRT when using the Paddle Inference TensorRT subgraph engine. | +int |
+3 |
+
precision |
+Precision for TensorRT when using the Paddle Inference TensorRT subgraph engine. Options: fp32 , fp16 , etc. |
+str |
+fp32 |
+
enable_mkldnn |
++Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. + | +bool |
+True |
+
cpu_threads |
+Number of threads to use for inference on CPUs. | +int |
+10 |
+
top_k |
+The top-k value for prediction results. If not specified, the default value in the official PaddleOCR model configuration is used. If the value is 5, the top 5 categories and their corresponding classification probabilities will be returned. | +int |
+None |
+
Parameter | +Description | +Type | +Default | +
---|---|---|---|
input |
+Input data for prediction. Multiple input types are supported. This parameter is required.
+
|
+Python Var|str|list |
++ |
batch_size |
+Batch size, positive integer. | +int |
+1 | +
Parameter | +Parameter Description | +Parameter Type | +Options | +Default Value | +
---|---|---|---|---|
input |
+Data to be predicted, supporting multiple input types | +Python Var /str /list |
+
+
|
+None | +
batch_size |
+Batch size | +int |
+Any integer | +1 | +
Method | +Method Description | +Parameter | +Parameter Type | +Parameter Description | +Default Value | +
---|---|---|---|---|---|
print() |
+Print the results to the terminal | +format_json |
+bool |
+Whether to format the output content using JSON indentation |
+True |
+
indent |
+int |
+Specify the indentation level to beautify the output JSON data, making it more readable, only effective when format_json is True |
+4 | +||
ensure_ascii |
+bool |
+Control whether to escape non-ASCII characters to Unicode . If set to True , all non-ASCII characters will be escaped; False retains the original characters, only effective when format_json is True |
+False |
+||
save_to_json() |
+Save the results as a JSON file | +save_path |
+str |
+The path to save the file. If it is a directory, the saved file name will be consistent with the input file name | +None | +
indent |
+int |
+Specify the indentation level to beautify the output JSON data, making it more readable, only effective when format_json is True |
+4 | +||
ensure_ascii |
+bool |
+Control whether to escape non-ASCII characters to Unicode . If set to True , all non-ASCII characters will be escaped; False retains the original characters, only effective when format_json is True |
+False |
+||
save_to_img() |
+Save the results as an image file | +save_path |
+str |
+The path to save the file. If it is a directory, the saved file name will be consistent with the input file name | +None | +
Attribute | +Attribute Description | +
---|---|
json |
+Get the prediction result in json format |
+
img |
+Get the visualization image in dict format |
+
model_name
str
无
None
model_dir
str
None
device
cpu
、gpu
、npu
、gpu:0
、gpu:0,1
。str
gpu:0
None
use_hpip
enable_hpi
bool
False
hpi_config
dict
| None
use_tensorrt
bool
False
min_subgraph_size
int
3
precision
fp32
、fp16
等。str
fp32
enable_mkldnn
bool
True
cpu_threads
int
10
top_k
int
None
参数 | 参数说明 | 参数类型 | -可选项 | 默认值 | |||
---|---|---|---|---|---|---|---|
input |
-待预测数据,支持多种输入类型 | -Python Var /str /list |
-+ | 待预测数据,支持多种输入类型,必填。
|
-无 | +Python Var|str|list |
+|
batch_size |
-批大小 | +批大小,可设置为任意正整数。 | int |
-任意整数 | 1 |
Model | Model Download Link | +Top-1 Accuracy (%) | +GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+CPU Inference Time (ms) | +Model Size (M) | +Description | +
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | Inference Model/Training Model | +98.85 | +- | +- | +0.96 | +Text line classification model based on PP-LCNet_x0_25, with two classes: 0 degrees and 180 degrees | +
PP-LCNet_x1_0_textline_ori | Inference Model/Training Model | +99.42 | +- | +- | +6.5 | +Text line classification model based on PP-LCNet_x1_0, with two classes: 0 degrees and 180 degrees | +
Model | Download Links | -Accuracy(%) | -GPU Inference Time (ms) [Standard / High-Performance] |
-CPU Inference Time (ms) [Standard / High-Performance] |
-Model Size (MB) | +Model | Download Link | +Recognition Avg Accuracy(%) | +GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
+CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
+Model Size (M) | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|
65.07 | 5.93 / 1.62 | 20.73 / 7.32 | -70 | -RepSVTR, a mobile-optimized version of SVTRv2, won first prize in the PaddleOCR Challenge, improving accuracy by 2.5% over PP-OCRv4 with comparable speed. | +22.1 M | +RepSVTR is a mobile text recognition model based on SVTRv2. It won first prize in the PaddleOCR Algorithm Model Challenge - Task 1: OCR End-to-End Recognition, improving end-to-end recognition accuracy by 2.5% compared to PP-OCRv4 on List B while maintaining comparable inference speed. |
Model | Download Links | -Accuracy(%) | -GPU Inference Time (ms) [Standard / High-Performance] |
-CPU Inference Time (ms) [Standard / High-Performance] |
-Model Size (MB) | +Model | Download Link | +Recognition Avg Accuracy(%) | +GPU Inference Time (ms) [Standard Mode / High Performance Mode] |
+CPU Inference Time (ms) [Standard Mode / High Performance Mode] |
+Model Size (M) | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Python Var|str|list |
-- | |||||||||||
save_path |
-Path to save inference results. If None , results are not saved locally. |
-str |
-- | |||||||||
doc_orientation_classify_model_name |
-Name of the document orientation classification model. If None , the default pipeline model is used. |
-str |
-None |
-|||||||||
doc_orientation_classify_model_dir |
-Directory path of the document orientation classification model. If None , the official model is downloaded. |
-str |
-None |
-|||||||||
doc_unwarping_model_name |
-Name of the text image correction model. If None , the default pipeline model is used. |
-str |
-None |
-|||||||||
doc_unwarping_model_dir |
-Directory path of the text image correction model. If None , the official model is downloaded. |
-str |
-None |
-|||||||||
text_detection_model_name |
-Name of the text detection model. If None , the default pipeline model is used. |
-str |
-None |
-|||||||||
text_detection_model_dir |
-Directory path of the text detection model. If None , the official model is downloaded. |
-str |
-None |
-|||||||||
text_line_orientation_model_name |
-Name of the text line orientation model. If None , the default pipeline model is used. |
-str |
-None |
-|||||||||
text_line_orientation_model_dir |
-Directory path of the text line orientation model. If None , the official model is downloaded. |
-str |
-None |
-|||||||||
text_line_orientation_batch_size |
-Batch size for the text line orientation model. If None , defaults to 1 . |
-int |
-None |
-|||||||||
text_recognition_model_name |
-Name of the text recognition model. If None , the default pipeline model is used. |
-str |
-None |
-|||||||||
text_recognition_model_dir |
-Directory path of the text recognition model. If None , the official model is downloaded. |
-str |
-None |
-|||||||||
text_recognition_batch_size |
-Batch size for the text recognition model. If None , defaults to 1 . |
-int |
-None |
-|||||||||
use_doc_orientation_classify |
-Whether to enable document orientation classification. If None , defaults to pipeline initialization value (True ). |
-bool |
-None |
-|||||||||
use_doc_unwarping |
-Whether to enable text image correction. If None , defaults to pipeline initialization value (True ). |
-bool |
-None |
-|||||||||
use_textline_orientation |
-Whether to enable text line orientation classification. If None , defaults to pipeline initialization value (True ). |
-bool |
-None |
-|||||||||
text_det_limit_side_len |
-Maximum side length limit for text detection.
-
|
-int |
-None |
-|||||||||
text_det_limit_type |
-Side length limit type for text detection.
-
|
-str |
-None |
-|||||||||
text_det_thresh |
-Pixel threshold for text detection. Pixels with scores > this threshold are considered text.
-
|
-float |
-None |
-|||||||||
text_det_box_thresh |
-Box threshold for text detection. Detected regions with average scores > this threshold are retained.
-
|
-float |
-None |
-|||||||||
text_det_unclip_ratio |
-Expansion ratio for text detection. Larger values expand text regions more.
-
|
-float |
-None |
-|||||||||
text_det_input_shape |
-Input shape for text detection. | -tuple |
-None |
-|||||||||
text_rec_score_thresh |
-Score threshold for text recognition. Results with scores > this threshold are retained.
-
|
-float |
-None |
-|||||||||
text_rec_input_shape |
-Input shape for text recognition. | -tuple |
-None |
-|||||||||
lang |
-Specifies the OCR model language.
-
|
-str |
-None |
-|||||||||
ocr_version |
-OCR model version.
-
|
-str |
-None |
-|||||||||
device |
-Device for inference. Supports:
-
|
-str |
-None |
-|||||||||
enable_hpi |
-Whether to enable high-performance inference. | -bool |
-False |
-|||||||||
use_tensorrt |
-Whether to use TensorRT for acceleration. | -bool |
-False |
-|||||||||
min_subgraph_size |
-Minimum subgraph size for model optimization. | -int |
-3 |
-|||||||||
precision |
-Computation precision (e.g., fp32 , fp16 ). |
-str |
-fp32 |
-|||||||||
enable_mkldnn |
-Whether to enable MKL-DNN acceleration. If None , enabled by default. |
-bool |
-None |
-|||||||||
cpu_threads |
-Number of CPU threads for inference. | -int |
-8 |
-
Parameter | +Parameter Description | +Parameter Type | +Default Value | +||
---|---|---|---|---|---|
input |
+Data to be predicted, required. Local path of an image file or PDF file: /root/data/img.jpg ; URL link, such as the network URL of an image file or PDF file: Example; Local directory, which must contain images to be predicted, such as the local path: /root/data/ (currently, predicting PDFs in a directory is not supported; PDFs need to specify the exact file path).
+ |
+str |
++ | ||
save_path |
+Path to save inference result files. If not set, inference results will not be saved locally. | +str |
++ | ||
doc_orientation_classify_model_name |
+Name of the document orientation classification model. If not set, the pipeline default model will be used. | +str |
++ | ||
doc_orientation_classify_model_dir |
+Directory path of the document orientation classification model. If not set, the official model will be downloaded. | +str |
++ | ||
doc_unwarping_model_name |
+Name of the text image unwarping model. If not set, the pipeline default model will be used. | +str |
++ | ||
doc_unwarping_model_dir |
+Directory path of the text image unwarping model. If not set, the official model will be downloaded. | +str |
++ | ||
text_detection_model_name |
+Name of the text detection model. If not set, the pipeline default model will be used. | +str |
++ | ||
text_detection_model_dir |
+Directory path of the text detection model. If not set, the official model will be downloaded. | +str |
++ | ||
textline_orientation_model_name |
+Name of the text line orientation model. If not set, the pipeline default model will be used. | +str |
++ | ||
textline_orientation_model_dir |
+Directory path of the text line orientation model. If not set, the official model will be downloaded. | +str |
++ | ||
textline_orientation_batch_size |
+Batch size for the text line orientation model. If not set, the default batch size will be 1 . |
+int |
++ | ||
text_recognition_model_name |
+Name of the text recognition model. If not set, the pipeline default model will be used. | +str |
++ | ||
text_recognition_model_dir |
+Directory path of the text recognition model. If not set, the official model will be downloaded. | +str |
++ | ||
text_recognition_batch_size |
+Batch size for the text recognition model. If not set, the default batch size will be 1 . |
+int |
++ | ||
use_doc_orientation_classify |
+Whether to load and use the document orientation classification module. If not set, the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+bool |
++ | ||
use_doc_unwarping |
+Whether to load and use the text image unwarping module. If not set, the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+bool |
++ | ||
use_textline_orientation |
+Whether to load and use the text line orientation module. If not set, the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+bool |
++ | ||
text_det_limit_side_len |
+Image side length limitation for text detection.
+Any integer greater than 0 . If not set, the pipeline's initialized value for this parameter (initialized to 64 ) will be used.
+ |
+int |
++ | ||
text_det_limit_type |
+Type of side length limit for text detection.
+Supports min and max . min means ensuring the shortest side of the image is not smaller than det_limit_side_len , and max means ensuring the longest side of the image is not larger than limit_side_len . If not set, the pipeline's initialized value for this parameter (initialized to min ) will be used.
+ |
+str |
++ | ||
text_det_thresh |
+Pixel threshold for text detection. In the output probability map, pixels with scores higher than this threshold will be considered text pixels.Any floating-point number greater than 0 . If not set, the pipeline's initialized value for this parameter (0.3 ) will be used.
+ |
+float |
++ | ||
text_det_box_thresh |
+Text detection box threshold. If the average score of all pixels within the detected result boundary is higher than this threshold, the result will be considered a text region.
+Any floating-point number greater than 0 . If not set, the pipeline's initialized value for this parameter (0.6 ) will be used.
+ |
+float |
++ | ||
text_det_unclip_ratio |
+Text detection expansion coefficient. This method is used to expand the text region—the larger the value, the larger the expanded area.
+Any floating-point number greater than 0 . If not set, the pipeline's initialized value for this parameter (2.0 ) will be used.
+ |
+float |
++ | ||
text_det_input_shape |
+Input shape for text detection, you can set three values to represent C, H, and W. | +int |
++ | ||
text_rec_score_thresh |
+Text recognition threshold. Text results with scores higher than this threshold will be retained.Any floating-point number greater than 0
+. If not set, the pipeline's initialized value for this parameter (0.0 , i.e., no threshold) will be used.
+ |
+float |
++ | ||
text_rec_input_shape |
+Input shape for text recognition. | +tuple |
++ | ||
lang |
+OCR model language to use. Please refer to the detailed list of languages below. + | +str |
++ | ||
ocr_version |
+OCR version, note that not every ocr_version supports all lang .
+
|
+str |
++ | ||
det_model_dir |
+Deprecated. Please refer text_detection_model_dir , they cannot be specified simultaneously with the new parameters. |
+str |
++ | ||
det_limit_side_len |
+Deprecated. Please refer text_det_limit_side_len , they cannot be specified simultaneously with the new parameters. |
+int |
++ | ||
det_limit_type |
+Deprecated. Please refer text_det_limit_type , they cannot be specified simultaneously with the new parameters.
+ |
+str |
++ | ||
det_db_thresh |
+Deprecated. Please refer text_det_thresh , they cannot be specified simultaneously with the new parameters.
+ |
+float |
++ | ||
det_db_box_thresh |
+Deprecated. Please refer text_det_box_thresh , they cannot be specified simultaneously with the new parameters.
+ |
+float |
++ | ||
det_db_unclip_ratio |
+Deprecated. Please refer text_det_unclip_ratio , they cannot be specified simultaneously with the new parameters.
+ |
+float |
++ | ||
rec_model_dir |
+Deprecated. Please refer text_recognition_model_dir , they cannot be specified simultaneously with the new parameters. |
+str |
++ | ||
rec_batch_num |
+Deprecated. Please refer text_recognition_batch_size , they cannot be specified simultaneously with the new parameters. |
+int |
++ | ||
use_angle_cls |
+Deprecated. Please refer use_textline_orientation , they cannot be specified simultaneously with the new parameters. |
+bool |
++ | ||
cls_model_dir |
+Deprecated. Please refer textline_orientation_model_dir , they cannot be specified simultaneously with the new parameters. |
+str |
++ | ||
cls_batch_num |
+Deprecated. Please refer textline_orientation_batch_size , they cannot be specified simultaneously with the new parameters. |
+int |
++ | ||
device |
+Device for inference. Supports specifying a specific card number:
+
|
+str |
++ | ||
enable_hpi |
+Whether to enable high-performance inference. | +bool |
+False |
+||
use_tensorrt |
+Whether to use TensorRT for inference acceleration. | +bool |
+False |
+||
min_subgraph_size |
+Minimum subgraph size for optimizing model subgraph computation. | +int |
+3 |
+||
precision |
+Computational precision, such as fp32, fp16. | +str |
+fp32 |
+||
enable_mkldnn |
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. + | +bool |
+True |
+||
cpu_threads |
+Number of threads used for inference on CPU. | +int |
+8 |
+||
paddlex_config |
-Path to PaddleX pipeline configuration file. | +Path to the PaddleX pipeline configuration file. | str |
-None |
+
OCR_version |
+Languages | +
---|---|
PP-OCRv5 | +PP-OCRv5 support the following languages:
+
|
+
PP-OCRv4 | +PP-OCRv4 support the following languages:
+
|
+
PP-OCRv3 | +PP-OCRv3 support the following languages:
+
+ Language List+
|
+
PaddleOCR()
. Parameter details:Parameter | -Description | -Type | -Default | -||||||
---|---|---|---|---|---|---|---|---|---|
doc_orientation_classify_model_name |
-Name of the document orientation model. If None , uses the default pipeline model. |
-str |
-None |
-||||||
doc_orientation_classify_model_dir |
-Directory path of the document orientation model. If None , downloads the official model. |
-str |
-None |
-||||||
doc_unwarping_model_name |
-Name of the text image correction model. If None , uses the default pipeline model. |
-str |
-None |
-||||||
doc_unwarping_model_dir |
-Directory path of the text image correction model. If None , downloads the official model. |
-str |
-None |
-||||||
text_detection_model_name |
-Name of the text detection model. If None , uses the default pipeline model. |
-str |
-None |
-||||||
text_detection_model_dir |
-Directory path of the text detection model. If None , downloads the official model. |
-str |
-None |
-||||||
text_line_orientation_model_name |
-Name of the text line orientation model. If None , uses the default pipeline model. |
-str |
-None |
-||||||
text_line_orientation_model_dir |
-Directory path of the text line orientation model. If None , downloads the official model. |
-str |
-None |
-||||||
text_line_orientation_batch_size |
-Batch size for the text line orientation model. If None , defaults to 1 . |
-int |
-None |
-||||||
text_recognition_model_name |
-Name of the text recognition model. If None , uses the default pipeline model. |
-str |
-None |
-||||||
text_recognition_model_dir |
-Directory path of the text recognition model. If None , downloads the official model. |
-str |
-None |
-||||||
text_recognition_batch_size |
-Batch size for the text recognition model. If None , defaults to 1 . |
-int |
-None |
-||||||
use_doc_orientation_classify |
-Whether to enable document orientation classification. If None , defaults to pipeline initialization (True ). |
-bool |
-None |
-||||||
use_doc_unwarping |
-Whether to enable text image correction. If None , defaults to pipeline initialization (True ). |
-bool |
-None |
-||||||
use_textline_orientation |
-Whether to enable text line orientation classification. If None , defaults to pipeline initialization (True ). |
-bool |
-None |
-||||||
text_det_limit_side_len |
-Maximum side length limit for text detection.
-
|
-int |
-None |
-||||||
text_det_limit_type |
-Side length limit type for text detection.
-
|
-str |
-None |
-||||||
text_det_thresh |
-Pixel threshold for text detection. Pixels with scores > this threshold are considered text.
-
|
-float |
-None |
-||||||
text_det_box_thresh |
-Box threshold for text detection. Detected regions with average scores > this threshold are retained.
-
|
-float |
-None |
-||||||
text_det_unclip_ratio |
-Expansion ratio for text detection. Larger values expand text regions more.
-
|
-float |
-None |
-||||||
text_det_input_shape |
-Input shape for text detection. | -tuple |
-None |
-||||||
text_rec_score_thresh |
-Score threshold for text recognition. Results with scores > this threshold are retained.
-
|
-float |
-None |
-||||||
text_rec_input_shape |
-Input shape for text recognition. | -tuple |
-None |
-||||||
lang |
-Specifies the OCR model language.
-
|
-str |
-None |
-||||||
ocr_version |
-OCR model version.
-
|
-str |
-None |
-||||||
device |
-Device for inference. Supports:
-
|
-str |
-None |
-||||||
enable_hpi |
-Whether to enable high-performance inference. | -bool |
-False |
-||||||
use_tensorrt |
-Whether to use TensorRT for acceleration. | -bool |
-False |
-||||||
min_subgraph_size |
-Minimum subgraph size for model optimization. | -int |
-3 |
-||||||
precision |
-Computation precision (e.g., fp32 , fp16 ). |
-str |
-fp32 |
-||||||
enable_mkldnn |
-Whether to enable MKL-DNN acceleration. If None , enabled by default. |
-bool |
-None |
-||||||
cpu_threads |
-Number of CPU threads for inference. | -int |
-8 |
-
Parameter | +Parameter Description | +Parameter Type | +Default Value | +|
---|---|---|---|---|
doc_orientation_classify_model_name |
+Name of the document orientation classification model. If set to None , the pipeline's default model will be used. |
+str |
+None |
+|
doc_orientation_classify_model_dir |
+ Directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
+ str |
+ None |
+ |
doc_unwarping_model_name |
+ Name of the text image unwarping model. If set to None , the pipeline's default model will be used. |
+ str |
+ None |
+ |
doc_unwarping_model_dir |
+ Directory path of the text image unwarping model. If set to None , the official model will be downloaded. |
+ str |
+ None |
+ |
text_detection_model_name |
+ Name of the text detection model. If set to None , the pipeline's default model will be used. |
+ str |
+ None |
+ |
text_detection_model_dir |
+ Directory path of the text detection model. If set to None , the official model will be downloaded. |
+ str |
+ None |
+ |
textline_orientation_model_name |
+ Name of the text line orientation model. If set to None , the pipeline's default model will be used. |
+ str |
+ None |
+ |
textline_orientation_model_dir |
+ Directory path of the text line orientation model. If set to None , the official model will be downloaded. |
+ str |
+ None |
+ |
textline_orientation_batch_size |
+ Batch size for the text line orientation model. If set to None , the default batch size will be 1 . |
+ int |
+ None |
+ |
text_recognition_model_name |
+ Name of the text recognition model. If set to None , the pipeline's default model will be used. |
+ str |
+ None |
+ |
text_recognition_model_dir |
+ Directory path of the text recognition model. If set to None , the official model will be downloaded. |
+ str |
+ None |
+ |
text_recognition_batch_size |
+ Batch size for the text recognition model. If set to None , the default batch size will be 1 . |
+ int |
+ None |
+ |
use_doc_orientation_classify |
+ Whether to load and use the document orientation classification module. If set to None , the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+ bool |
+ None |
+ |
use_doc_unwarping |
+ Whether to load and use the text image unwarping module. If set to None , the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+ bool |
+ None |
+ |
use_textline_orientation |
+ Whether to load and use the text line orientation module. If set to None , the pipeline's initialized value for this parameter (initialized to True ) will be used. |
+ bool |
+ None |
+ |
text_det_limit_side_len |
+ Image side length limitation for text detection.
+
|
+ int |
+ None |
+ |
text_det_limit_type |
+ Type of side length limit for text detection.
+
|
+ str |
+ None |
+ |
text_det_thresh |
+ Pixel threshold for text detection. Pixels with scores higher than this threshold in the output probability map will be considered text pixels.
+
|
+ float |
+ None |
+ |
text_det_box_thresh |
+ Box threshold for text detection. A detection result will be considered a text region if the average score of all pixels within the bounding box is higher than this threshold.
+
|
+ float |
+ None |
+ |
text_det_unclip_ratio |
+ Dilation coefficient for text detection. This method is used to dilate the text region, and the larger this value, the larger the dilated area.
+
|
+ float |
+ None |
+ |
text_det_input_shape |
+ Input shape for text detection. | +tuple |
+ None |
+ |
text_rec_score_thresh |
+ Recognition score threshold for text. Text results with scores higher than this threshold will be retained.
+
|
+float |
+None |
+|
text_rec_input_shape |
+Input shape for text recognition. | +tuple |
+None |
+|
lang |
+OCR model language to use. Please refer to the detailed list of languages above. + | +str |
+None |
+|
ocr_version |
+OCR version, note that not every ocr_version supports all lang .
+
|
+str |
+None |
+|
device |
+Device for inference. Supports specifying a specific card number:
+
|
+str |
+None |
+|
enable_hpi |
+Whether to enable high-performance inference. | +bool |
+False |
+|
use_tensorrt |
+Whether to use TensorRT for inference acceleration. | +bool |
+False |
+|
min_subgraph_size |
+Minimum subgraph size for optimizing subgraph computation. | +int |
+3 |
+|
precision |
+Computational precision, such as fp32, fp16. | +str |
+"fp32" |
+|
enable_mkldnn |
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | +bool |
+True |
+|
cpu_threads |
+Number of threads used for CPU inference. | +int |
+8 |
+|
paddlex_config |
-Path to PaddleX pipeline configuration file. | +Path to the PaddleX pipeline configuration file. | str |
None |
predict()
method for inference. Alternatively, predict_iter()
returns a generator for memory-efficient batch processing. Parameters:Parameter | -Description | -Type | -Default | -
---|---|---|---|
input |
-Input data (required). Supports:
-
|
-Python Var|str|list |
-- |
device |
-Same as initialization. | -str |
-None |
-
use_doc_orientation_classify |
-Whether to enable document orientation classification during inference. | -bool |
-None |
-
use_doc_unwarping |
-Whether to enable text image correction during inference. | -bool |
-None |
-use_textline_orientation |
-Whether to enable text line orientation classification during inference. | -bool |
-None |
-
-text_det_limit_side_len |
-Same as initialization. | -int |
-None |
-
-text_det_limit_type |
-Same as initialization. | -str |
-None |
-
-text_det_thresh |
-Same as initialization. | -float |
-None |
-
-text_det_box_thresh |
-Same as initialization. | -float |
-None |
-
-text_det_unclip_ratio |
-Same as initialization. | -float |
-None |
-
+
+
predict()
method of the OCR pipeline object for inference prediction, which returns a results list. Additionally, the pipeline provides the predict_iter()
method. Both methods are completely consistent in parameter acceptance and result return, except that predict_iter()
returns a generator
, which can process and obtain prediction results incrementally, suitable for handling large datasets or scenarios where memory saving is desired. You can choose to use either of these two methods according to actual needs. The following are the parameters and descriptions of the predict()
method:Parameter | +Parameter Description | +Parameter Type | +Default Value | +
---|---|---|---|
input |
+Data to be predicted, supporting multiple input types, required.
+
|
+Python Var|str|list |
++ |
use_doc_orientation_classify |
+Whether to use the document orientation classification module during inference. | +bool |
+None |
+
use_doc_unwarping |
+Whether to use the text image unwarping module during inference. | +bool |
+None |
+use_textline_orientation |
+Whether to use the text line orientation classification module during inference. | +bool |
+None |
+
+text_det_limit_side_len |
+The same as the parameter during instantiation. | +int |
+None |
+
+text_det_limit_type |
+The same as the parameter during instantiation. | +str |
+None |
+
+text_det_thresh |
+The same as the parameter during instantiation. | +float |
+None |
+
+text_det_box_thresh |
+The same as the parameter during instantiation. | +float |
+None |
+
+text_det_unclip_ratio |
+The same as the parameter during instantiation. | +float |
+None |
+
text_rec_score_thresh |
-Same as initialization. | +The same as the parameter during instantiation. | float |
None |
json
files:json
file:Method | -Description | +Method Description | Parameter | -Type | -Explanation | -Default | +Parameter Type | +Parameter Description | +Default Value |
---|---|---|---|---|---|---|---|---|---|
print() |
-Print results to terminal | +Print the results to the terminal | format_json |
bool |
-Whether to format output with JSON indentation |
+Whether to format the output content with JSON indentation. |
True |
||
indent |
int |
-Indentation level for prettifying JSON output (only when format_json=True ) |
+Specify the indentation level to beautify the output JSON data and make it more readable, only valid when format_json is True . |
4 | |||||
ensure_ascii |
bool |
-Whether to escape non-ASCII characters to Unicode (only when format_json=True ) |
+Control whether to escape non-ASCII characters as Unicode . When set to True , all non-ASCII characters will be escaped; False retains the original characters, only valid when format_json is True . |
False |
|||||
save_to_json() |
-Save results as JSON file | +Save the results as a json-formatted file. | save_path |
str |
-Output file path (uses input filename when directory specified) | -None | +File path to save. When it is a directory, the saved file name will be consistent with the input file type name. | +No default | |
indent |
int |
-Indentation level for prettifying JSON output (only when format_json=True ) |
+Specify the indentation level to beautify the output JSON data and make it more readable, only valid when format_json is True . |
4 | |||||
ensure_ascii |
bool |
-Whether to escape non-ASCII characters (only when format_json=True ) |
+Control whether to escape non-ASCII characters as Unicode . When set to True , all non-ASCII characters will be escaped; False retains the original characters, only valid when format_json is True . |
False |
|||||
save_to_img() |
-Save results as image file | +Save the results as an image-formatted file | save_path |
str |
-Output path (supports directory or file path) | -None | +File path to save, supporting directory or file path. | +No default |
print()
method outputs results to terminal with the following structure:
-
- - input_path
: (str)
Input image path
-
- - page_index
: (Union[int, None])
PDF page number (if input is PDF), otherwise None
-
- - model_settings
: (Dict[str, bool])
Pipeline configuration
- - use_doc_preprocessor
: (bool)
Whether document preprocessing is enabled
- - use_textline_orientation
: (bool)
Whether text line orientation classification is enabled
-
- - doc_preprocessor_res
: (Dict[str, Union[str, Dict[str, bool], int]])
Document preprocessing results (only when use_doc_preprocessor=True
)
- - input_path
: (Union[str, None])
Preprocessor input path (None
for numpy.ndarray
input)
- - model_settings
: (Dict)
Preprocessor configuration
- - use_doc_orientation_classify
: (bool)
Whether document orientation classification is enabled
- - use_doc_unwarping
: (bool)
Whether text image correction is enabled
- - angle
: (int)
Document orientation prediction (0-3 for 0°,90°,180°,270°; -1 if disabled)
-
- - dt_polys
: (List[numpy.ndarray])
Text detection polygons (4 vertices per box, shape=(4,2), dtype=int16)
-
- - dt_scores
: (List[float])
Text detection confidence scores
-
- - text_det_params
: (Dict[str, Dict[str, int, float]])
Text detection parameters
- - limit_side_len
: (int)
Image side length limit
- - limit_type
: (str)
Length limit handling method
- - thresh
: (float)
Text pixel classification threshold
- - box_thresh
: (float)
Detection box confidence threshold
- - unclip_ratio
: (float)
Text region expansion ratio
- - text_type
: (str)
Fixed as "general"
-
- - textline_orientation_angles
: (List[int])
Text line orientation predictions (actual angles when enabled, [-1,-1,-1] when disabled)
-
- - text_rec_score_thresh
: (float)
Text recognition score threshold
-
- - rec_texts
: (List[str])
Recognized texts (filtered by text_rec_score_thresh
)
-
- - rec_scores
: (List[float])
Recognition confidence scores (filtered)
-
- - rec_polys
: (List[numpy.ndarray])
Filtered detection polygons (same format as dt_polys
)
-
- - rec_boxes
: (numpy.ndarray)
Rectangular bounding boxes (shape=(n,4), dtype=int16) with [x_min, y_min, x_max, y_max] coordinates
-
-- save_to_json()
saves results to specified save_path
:
- - Directory: saves as save_path/{your_img_basename}_res.json
- - File: saves directly to specified path
- - Note: Converts numpy.array
to lists since JSON doesn't support numpy arrays
-
-- save_to_img()
saves visualization results:
- - Directory: saves as save_path/{your_img_basename}_ocr_res_img.{your_img_extension}
- - File: saves directly (not recommended for multiple images to avoid overwriting)
+print()
method will print the results to the terminal. The content printed to the terminal is explained as follows:
+ input_path
: (str)
Input path of the image to be predictedpage_index
: (Union[int, None])
If the input is a PDF file, it indicates which page of the PDF it is; otherwise, it is None
model_settings
: (Dict[str, bool])
Model parameters configured for the pipeline
+ use_doc_preprocessor
: (bool)
Control whether to enable the document preprocessing sub-pipelineuse_textline_orientation
: (bool)
Control whether to enable the text line orientation classification functiondoc_preprocessor_res
: (Dict[str, Union[str, Dict[str, bool], int]])
Output results of the document preprocessing sub-pipeline. Only exists when use_doc_preprocessor=True
+ input_path
: (Union[str, None])
Image path accepted by the image preprocessing sub-pipeline. When the input is numpy.ndarray
, it is saved as None
model_settings
: (Dict)
Model configuration parameters of the preprocessing sub-pipeline
+ use_doc_orientation_classify
: (bool)
Control whether to enable document orientation classificationuse_doc_unwarping
: (bool)
Control whether to enable text image unwarpingangle
: (int)
Prediction result of document orientation classification. When enabled, the values are [0,1,2,3], corresponding to [0°,90°,180°,270°]; when disabled, it is -1dt_polys
: (List[numpy.ndarray])
List of text detection polygon boxes. Each detection box is represented by a numpy array of 4 vertex coordinates, with the array shape being (4, 2) and the data type being int16dt_scores
: (List[float])
List of confidence scores for text detection boxestext_det_params
: (Dict[str, Dict[str, int, float]])
Configuration parameters for the text detection module
+ limit_side_len
: (int)
Side length limit value during image preprocessinglimit_type
: (str)
Processing method for side length limitsthresh
: (float)
Confidence threshold for text pixel classificationbox_thresh
: (float)
Confidence threshold for text detection boxesunclip_ratio
: (float)
Dilation coefficient for text detection boxestext_type
: (str)
Type of text detection, currently fixed as "general"textline_orientation_angles
: (List[int])
Prediction results of text line orientation classification. When enabled, actual angle values are returned (e.g., [0,0,1]); when disabled, [-1,-1,-1] is returnedtext_rec_score_thresh
: (float)
Filtering threshold for text recognition resultsrec_texts
: (List[str])
List of text recognition results, containing only texts with confidence scores exceeding text_rec_score_thresh
rec_scores
: (List[float])
List of text recognition confidence scores, filtered by text_rec_score_thresh
rec_polys
: (List[numpy.ndarray])
List of text detection boxes filtered by confidence, in the same format as dt_polys
rec_boxes
: (numpy.ndarray)
Array of rectangular bounding boxes for detection boxes, with shape (n, 4) and dtype int16. Each row represents the [x_min, y_min, x_max, y_max] coordinates of a rectangular box, where (x_min, y_min) is the top-left coordinate and (x_max, y_max) is the bottom-right coordinatesave_to_json()
method will save the above content to the specified save_path
. If a directory is specified, the save path will be save_path/{your_img_basename}_res.json
. If a file is specified, it will be saved directly to that file. Since json files do not support saving numpy arrays, numpy.array
types will be converted to list form.save_to_img()
method will save the visualization results to the specified save_path
. If a directory is specified, the save path will be save_path/{your_img_basename}_ocr_res_img.{your_img_extension}
. If a file is specified, it will be saved directly to that file. (The pipeline usually generates many result images, so it is not recommended to directly specify a specific file path, as multiple images will be overwritten, leaving only the last one.)Additionally, you can also obtain the visualized image with results and prediction results through attributes, as follows:
Attribute | -Description | +Attribute Description |
---|---|---|
json |
-Retrieves prediction results in json format |
+Get the prediction results in json format |
img |
-Retrieves visualized images in dict format |
+Get the visualized image in dict format |
json
attribute are in dict format, and the content is consistent with that saved by calling the save_to_json()
method.img
attribute returns a dictionary-type result. The keys are ocr_res_img
and preprocessed_img
, with corresponding values being two Image.Image
objects: one for displaying the visualized image of OCR results and the other for displaying the visualized image of image preprocessing. If the image preprocessing submodule is not used, only ocr_res_img
will be included in the dictionary.模型 | +模型下载链接 | +Top-1 Acc(%) | +GPU推理耗时(ms) | +CPU推理耗时 (ms) | +模型存储大小(M) | +介绍 | +
---|---|---|---|---|---|---|
PP-LCNet_x0_25_textline_ori | 推理模型/训练模型 | +98.85 | +- | +- | +0.96 | +基于PP-LCNet_x0_25的文本行分类模型,含有两个类别,即0度,180度 | +
PP-LCNet_x1_0_textline_ori | 推理模型/训练模型 | +99.42 | +- | +- | +6.5 | +基于PP-LCNet_x1_0的文本行分类模型,含有两个类别,即0度,180度 | +
86.58 | +81.53 | 6.65 / 2.38 | 32.92 / 32.92 | -181 M | +74.7 M | PP-OCRv4_server_rec_doc是在PP-OCRv4_server_rec的基础上,在更多中文文档数据和PP-OCR训练数据的混合数据训练而成,增加了部分繁体字、日文、特殊字符的识别能力,可支持识别的字符为1.5万+,除文档相关的文字识别能力提升外,也同时提升了通用文字的识别能力 | ||
PP-OCRv4_mobile_rec | 推理模型/训练模型 | -83.28 | +78.74 | 4.82 / 1.20 | 16.74 / 4.64 | -88 M | +10.6 M | PP-OCRv4的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中 |
PP-OCRv4_server_rec | 推理模型/训练模型 | -85.19 | +80.61 | 6.58 / 2.43 | 33.17 / 33.17 | -151 M | +71.2 M | PP-OCRv4的服务器端模型,推理精度高,可以部署在多种不同的服务器上 |
86.58 | +81.53 | 6.65 / 2.38 | 32.92 / 32.92 | -91 M | +74.7 M | PP-OCRv4_server_rec_doc是在PP-OCRv4_server_rec的基础上,在更多中文文档数据和PP-OCR训练数据的混合数据训练而成,增加了部分繁体字、日文、特殊字符的识别能力,可支持识别的字符为1.5万+,除文档相关的文字识别能力提升外,也同时提升了通用文字的识别能力 | ||
PP-OCRv4_mobile_rec | 推理模型/训练模型 | -83.28 | +78.74 | 4.82 / 1.20 | 16.74 / 4.64 | -11 M | +10.6 M | PP-OCRv4的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中 |
PP-OCRv4_server_rec | 推理模型/训练模型 | -85.19 | +80.61 | 6.58 / 2.43 | 33.17 / 33.17 | -87 M | +71.2 M | PP-OCRv4的服务器端模型,推理精度高,可以部署在多种不同的服务器上 |
PP-OCRv3_mobile_rec | 推理模型/训练模型 | -75.43 | +72.96 | 5.87 / 1.19 | 9.07 / 4.28 | -11 M | +9.2 M | PP-OCRv3的轻量级识别模型,推理效率高,可以部署在包含端侧设备的多种硬件设备中 |
/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
Python Var|str|list
str
save_path
None
, 推理结果将不会保存到本地。str
None
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。str
None
text_detection_model_name
None
, 将会使用产线默认模型。str
None
text_detection_model_dir
None
, 将会下载官方模型。str
None
text_line_orientation_model_name
None
, 将会使用产线默认模型。textline_orientation_model_name
str
None
text_line_orientation_model_dir
None
, 将会下载官方模型。textline_orientation_model_dir
str
None
text_line_orientation_batch_size
None
, 将默认设置批处理大小为1
。textline_orientation_batch_size
1
。int
None
text_recognition_model_name
None
, 将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。1
。int
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_textline_orientation
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 64
。
int
None
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 min
。
str
None
text_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.3
。
float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
text_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 2.0
。
float
None
text_det_input_shape
tuple
None
int
text_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
text_rec_input_shape
tuple
None
lang
None
, 将默认使用ch
;str
None
ocr_version
ocr_version
都支持所有的lang
。
PP-OCRv5
系列模型;
-PP-OCRv4
系列模型;
-PP-OCRv3
系列模型;
-None
, 将默认使用PP-OCRv5
系列模型;str
None
det_model_dir
text_detection_model_dir
代替。文本检测模型的目录路径。如果设置为None, 将会下载官方模型。text_detection_model_dir
,且与新的参数不能同时指定。str
None
det_limit_side_len
text_det_limit_side_len
代替。文本检测的最大边长度限制。text_det_limit_side_len
,且与新的参数不能同时指定。int
None
det_limit_type
text_det_limit_type
代替。文本检测的边长度限制类型。
-min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
;text_det_limit_type
,且与新的参数不能同时指定。
str
None
det_db_thresh
text_det_thresh
代替。文本检测像素阈值,输出的概率图中,得分大于该阈值的像素点才会被认为是文字像素点。
-0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.3
text_det_thresh
,且与新的参数不能同时指定。
float
None
det_db_box_thresh
text_det_box_thresh
代替。文本检测框阈值,检测结果边框内,所有像素点的平均得分大于该阈值时,该结果会被认为是文字区域。
-0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
text_det_box_thresh
,且与新的参数不能同时指定。
float
None
det_db_unclip_ratio
text_det_unclip_ratio
代替。文本检测扩张系数,使用该方法对文字区域进行扩张,该值越大,扩张的面积越大。
-0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 2.0
text_det_unclip_ratio
,且与新的参数不能同时指定。
float
None
rec_model_dir
text_recognition_model_dir
代替。文本识别模型的目录路径。如果设置为None
, 将会下载官方模型。text_recognition_model_dir
,且与新的参数不能同时指定。str
None
rec_batch_num
text_recognition_batch_size
代替。文本识别模型的批处理大小。如果设置为None
, 将默认设置批处理大小为1
。text_recognition_batch_size
,且与新的参数不能同时指定。int
None
use_angle_cls
use_textline_orientation
代替。是否使用文本行方向功能。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。use_textline_orientation
,且与新的参数不能同时指定。bool
None
cls_model_dir
text_line_orientation_model_dir
代替。文本行方向模型的目录路径。如果设置为None
, 将会下载官方模型。textline_orientation_model_dir
,且与新的参数不能同时指定。str
None
cls_batch_num
text_line_orientation_batch_size
代替。文本行方向模型的批处理大小。如果设置为None
, 将默认设置批处理大小为1
。textline_orientation_batch_size
,且与新的参数不能同时指定。int
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;str
None
enable_hpi
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
paddlex_config
str
None
ocr_version |
+语种 | +
---|---|
PP-OCRv5 | +PP-OCRv5支持以下语言:
+
|
+
PP-OCRv4 | +PP-OCRv4支持以下语言:
+
|
+
PP-OCRv3 | +PP-OCRv3支持以下语言:
+
+ 语言列表+
|
+
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
text_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_line_orientation_model_name
None
, 将会使用产线默认模型。textline_orientation_model_name
None
,将会使用产线默认模型。str
None
text_line_orientation_model_dir
None
, 将会下载官方模型。textline_orientation_model_dir
None
,将会下载官方模型。str
None
text_line_orientation_batch_size
None
, 将默认设置批处理大小为1
。textline_orientation_batch_size
None
,将默认设置批处理大小为1
。int
None
text_recognition_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。None
,将默认设置批处理大小为1
。int
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_textline_orientation
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;None
,将默认使用产线初始化的该参数值,初始化为 64
。int
None
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 min
。str
None
text_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.3
。float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.6
。
float
None
text_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 2.0
。
+
float
None
text_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。
+
float
None
lang
None
, 将默认使用ch
;str
None
ocr_version
ocr_version
都支持所有的lang
。
PP-OCRv5
系列模型;
-PP-OCRv4
系列模型;
-PP-OCRv3
系列模型;
-None
, 将默认使用PP-OCRv5
系列模型;str
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;None
,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
+
str
None
precision
str
fp32
"fp32"
enable_mkldnn
None
, 将默认启用。
-bool
None
True
cpu_threads
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据;/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]。
Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
format_json
bool
JSON
缩进格式化JSON
缩进格式化。True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
print()
方法会将结果打印到终端,打印到终端的内容解释如下:
+ input_path
: (str)
待预测图像的输入路径page_index
: (Union[int, None])
如果输入是PDF文件,则表示当前是PDF的第几页,否则为 None
model_settings
: (Dict[str, bool])
配置产线所需的模型参数
+ use_doc_preprocessor
: (bool)
控制是否启用文档预处理子产线use_textline_orientation
: (bool)
控制是否启用文本行方向分类模块doc_preprocessor_res
: (Dict[str, Union[str, Dict[str, bool], int]])
文档预处理子产线的输出结果。仅当use_doc_preprocessor=True
时存在
+ input_path
: (Union[str, None])
图像预处理子产线接受的图像路径,当输入为numpy.ndarray
时,保存为None
model_settings
: (Dict)
预处理子产线的模型配置参数
+ use_doc_orientation_classify
: (bool)
控制是否启用文档方向分类use_doc_unwarping
: (bool)
控制是否启用文本图像矫正angle
: (int)
文档方向分类的预测结果。启用时取值为[0,1,2,3],分别对应[0°,90°,180°,270°];未启用时为-1dt_polys
: (List[numpy.ndarray])
文本检测的多边形框列表。每个检测框由4个顶点坐标构成的numpy数组表示,数组shape为(4, 2),数据类型为int16dt_scores
: (List[float])
文本检测框的置信度列表text_det_params
: (Dict[str, Dict[str, int, float]])
文本检测模块的配置参数
+ limit_side_len
: (int)
图像预处理时的边长限制值limit_type
: (str)
边长限制的处理方式thresh
: (float)
文本像素分类的置信度阈值box_thresh
: (float)
文本检测框的置信度阈值unclip_ratio
: (float)
文本检测框的膨胀系数text_type
: (str)
文本检测的类型,当前固定为"general"textline_orientation_angles
: (List[int])
文本行方向分类的预测结果。启用时返回实际角度值(如[0,0,1]),未启用时返回[-1,-1,-1]text_rec_score_thresh
: (float)
文本识别结果的过滤阈值rec_texts
: (List[str])
文本识别结果列表,仅包含置信度超过text_rec_score_thresh
的文本rec_scores
: (List[float])
文本识别的置信度列表,已按text_rec_score_thresh
过滤rec_polys
: (List[numpy.ndarray])
经过置信度过滤的文本检测框列表,格式同dt_polys
rec_boxes
: (numpy.ndarray)
检测框的矩形边界框数组,shape为(n, 4),dtype为int16。每一行表示一个矩形框的[x_min, y_min, x_max, y_max]坐标,其中(x_min, y_min)为左上角坐标,(x_max, y_max)为右下角坐标save_to_json()
方法会将上述内容保存到指定的save_path
中,如果指定为目录,则保存的路径为save_path/{your_img_basename}_res.json
,如果指定为文件,则直接保存到该文件中。由于json文件不支持保存numpy数组,因此会将其中的numpy.array
类型转换为列表形式。save_to_img()
方法会将可视化结果保存到指定的save_path
中,如果指定为目录,则保存的路径为save_path/{your_img_basename}_ocr_res_img.{your_img_extension}
,如果指定为文件,则直接保存到该文件中。(产线通常包含较多结果图片,不建议直接指定为具体的文件路径,否则多张图会被覆盖,仅保留最后一张图)此外,也支持通过属性获取带结果的可视化图像和预测结果,具体如下:
json
属性获取的预测结果为dict类型的数据,相关内容与调用 save_to_json()
方法保存的内容一致。img
属性返回的预测结果是一个dict类型的数据。其中,键分别为 ocr_res_img
和 preprocessed_img
,对应的值是两个 Image.Image
对象:一个用于显示 OCR 结果的可视化图像,另一个用于展示图像预处理的可视化图像。如果没有使用图像预处理子模块,则dict中只包含 ocr_res_img
。Parameter | -Parameter Description | -Parameter Type | -Options | -Default Value | +Description | +Type | +Default | |
---|---|---|---|---|---|---|---|---|
input |
-The data to be predicted, supporting multiple input types, required. | -Python Var|str|list |
-
-
| Data to be predicted, required. Such as the local path of an image file or PDF file: /root/data/img.jpg ; URL link, such as the network URL of an image file or PDF file: Example; Local directory, which should contain images to be predicted, such as the local path: /root/data/ (currently does not support prediction of PDF files in directories, PDF files need to be specified to the specific file path).
|
-None |
+str |
+||
device |
-The device for pipeline inference. | -str|None |
-
-
|
-None |
+keys |
+Keys for information extraction. | +str |
+|
use_doc_orientation_classify |
-Whether to use the document orientation classification module. | -bool|None |
+save_path |
-
|
-None |
+Specify the path to save the inference results file. If not set, the inference results will not be saved locally.
+str |
+||
use_doc_unwarping |
-Whether to use the document distortion correction module. | -bool|None |
-
-
|
-None |
+invoke_mllm |
+Whether to load and use a multimodal large model. If not set, the default is False . |
+bool |
+|
use_textline_orientation |
-Whether to use the text line orientation classification module. | -bool|None |
+layout_detection_model_name |
-
|
+str |
++ | ||
layout_detection_model_dir |
+The directory path of the layout detection model. If not set, the official model will be downloaded. | -None |
+str |
+|||||
use_general_ocr |
-Whether to use the OCR sub-pipeline. | -bool|None |
-
-
| doc_orientation_classify_model_name |
++The name of the document orientation classification model. If not set, the default model in pipeline will be used. | +str |
++ | |
doc_orientation_classify_model_dir |
+The directory path of the document orientation classification model. If not set, the official model will be downloaded. | +str |
++ | |||||
doc_unwarping_model_name |
+The name of the text image unwarping model. If not set, the default model in pipeline will be used. | +str |
++ | |||||
doc_unwarping_model_dir |
+The directory path of the text image unwarping model. If not set, the official model will be downloaded. | -None |
+str |
++ | ||||
text_detection_model_name |
+Name of the text detection model. If not set, the pipeline's default model will be used. | +str |
++ | |||||
text_detection_model_dir |
+Directory path of the text detection model. If not set, the official model will be downloaded. | +str |
++ | |||||
text_recognition_model_name |
+Name of the text recognition model. If not set, the pipeline's default model will be used. | +str |
++ | |||||
text_recognition_model_dir |
+Directory path of the text recognition model. If not set, the official model will be downloaded. | +str |
++ | |||||
text_recognition_batch_size |
+Batch size for the text recognition model. If not set, the default batch size will be 1 . |
+int |
++ | |||||
table_structure_recognition_model_name |
+Name of the table structure recognition model. If not set, the official model will be downloaded. | +str |
++ | |||||
table_structure_recognition_model_dir |
+Directory path of the table structure recognition model. If not set, the official model will be downloaded. | +str |
++ | |||||
seal_text_detection_model_name |
+The name of the seal text detection model. If not set, the pipeline's default model will be used. | +str |
++ | |||||
seal_text_detection_model_dir |
+The directory path of the seal text detection model. If not set, the official model will be downloaded. | +str |
++ | |||||
seal_text_recognition_model_name |
+The name of the seal text recognition model. If not set, the default model of the pipeline will be used. | +str |
++ | |||||
seal_text_recognition_model_dir |
+The directory path of the seal text recognition model. If not set, the official model will be downloaded. | +str |
++ | |||||
seal_text_recognition_batch_size |
+The batch size for the seal text recognition model. If not set, the batch size will default to 1 . |
+int |
++ | |||||
use_doc_orientation_classify |
+Whether to load and use the document orientation classification module. If not set, the parameter value initialized by the pipeline will be used by default, initialized as True . |
+bool |
++ | |||||
use_doc_unwarping |
+Whether to load and use the text image unwarping module. If not set, the parameter value initialized by the pipeline will be used by default, initialized as True . |
+bool |
++ | |||||
use_textline_orientation |
+Whether to load and use the text line orientation classification module. If not set, the parameter value initialized by the pipeline will be used by default, initialized as True . |
+bool |
+||||||
use_seal_recognition |
-Whether to use the seal recognition sub-pipeline. | -bool|None |
-
-
|
-None |
+Whether to load and use the seal recognition sub-pipeline. If not set, the parameter's value initialized during pipeline setup will be used, defaulting to True . |
+bool |
+||
use_table_recognition |
-Whether to use the table recognition sub-pipeline. | -bool|None |
-
-
|
-None |
+Whether to load and use the table recognition sub-pipeline. If not set, the parameter's value initialized during pipeline setup will be used, defaulting to True . |
+bool |
+||
layout_threshold |
-The score threshold for the layout model. | -float|dict|None |
-
-
| Score threshold for the layout model. Any value between 0-1 . If not set, the default value is used, which is 0.5 .
|
-None |
+float |
+||
layout_nms |
-Whether to use NMS. | -bool|None |
-
True by default.
|
-None |
+bool |
+|||
layout_unclip_ratio |
-The expansion coefficient for layout detection. | -float|Tuple[float,float]|dict|None |
-
-
| Unclip ratio for detected boxes in layout detection model. Any float > 0 . If not set, the default is 1.0 .
|
-None |
+float |
+||
layout_merge_bboxes_mode |
-The method for filtering overlapping bounding boxes. | -str|dict|None |
-+ | The merging mode for the detection boxes output by the model in layout region detection.
large .
|
-None |
+str |
+||
text_det_limit_side_len |
-The side length limit for text detection images. | -int|None |
-
-
| Image side length limitation for text detection.
+Any integer greater than 0 . If not set, the pipeline's initialized value for this parameter (initialized to 960 ) will be used.
|
-None |
+int |
+||
text_det_limit_type |
-The type of side length limit for text detection images. | -str|None |
-
-
| Type of side length limit for text detection.
+Supports min and max . min means ensuring the shortest side of the image is not smaller than det_limit_side_len , and max means ensuring the longest side of the image is not larger than limit_side_len . If not set, the pipeline's initialized value for this parameter (initialized to max ) will be used.
|
-None |
+str |
+||
text_det_thresh |
-The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as text pixels. | -float|None |
-
-
| Pixel threshold for text detection. In the output probability map, pixels with scores higher than this threshold will be considered text pixels.
+Any floating-point number greater than 0
+. If not set, the pipeline's initialized value for this parameter (0.3 ) will be used.
|
-None |
+float |
+||
text_det_box_thresh |
-The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a text region. | -float|None |
-
-
| Text detection box threshold. If the average score of all pixels within the detected result boundary is higher than this threshold, the result will be considered a text region.
+ Any floating-point number greater than 0 . If not set, the pipeline's initialized value for this parameter (0.6 ) will be used.
|
-None |
+float |
+||
text_det_unclip_ratio |
-The expansion coefficient for text detection. This method is used to expand the text region, and the larger the value, the larger the expansion area. | -float|None |
-
-
| Text detection expansion coefficient. This method is used to expand the text region—the larger the value, the larger the expanded area.
+Any floating-point number greater than 0
+. If not set, the pipeline's initialized value for this parameter (2.0 ) will be used.
|
-None |
+float |
+||
text_rec_score_thresh |
-The text recognition threshold. Text results with scores greater than this threshold will be retained. | -float|None |
-
-
| Text recognition threshold. Text results with scores higher than this threshold will be retained.
+ Any floating-point number greater than 0
+. If not set, the pipeline's initialized value for this parameter (0.0 , i.e., no threshold) will be used.
|
-None |
+float |
+||
seal_det_limit_side_len |
-The side length limit for seal detection images. | -int|None |
-
-
| Image side length limit for seal text detection.
+Any integer > 0 . If not set, the default is 736 .
|
-None |
+int | don’t
+||
seal_det_limit_type |
-The type of side length limit for seal detection images. | -str|None |
-
-
| Limit type for image side in seal text detection.
+supports min and max ; min ensures shortest side ≥ det_limit_side_len , max ensures longest side ≤ limit_side_len . If not set, the default is min .
|
-None |
+str |
+||
seal_det_thresh |
-The pixel threshold for detection. In the output probability map, pixel points with scores greater than this threshold will be considered as seal pixels. | -float|None |
-
-
| Pixel threshold. Pixels with scores above this value in the probability map are considered text.
+Any float > 0
+If not set, the default is 0.2 .
|
-None |
+float |
+||
seal_det_box_thresh |
-The bounding box threshold for detection. When the average score of all pixel points within the detection result bounding box is greater than this threshold, the result will be considered as a seal region. | -float|None |
-
-
| Box threshold. Boxes with average pixel scores above this value are considered text regions.Any float > 0 . If not set, the default is 0.6 .
|
-None |
+float |
+||
seal_det_unclip_ratio |
-The expansion coefficient for seal detection. This method is used to expand the seal region, and the larger the value, the larger the expansion area. | -float|None |
-
-
| Expansion ratio for seal text detection. Higher value means larger expansion area.
+any float > 0 . If not set, the default is 0.5 .
|
-None |
+float |
+||
seal_rec_score_thresh |
-The seal recognition threshold. Text results with scores greater than this threshold will be retained. | -float|None |
-+ | Recognition score threshold. Text results above this value will be kept.
+Any float > 0
+If not set, the default is 0.0 (no threshold).
+ |
+float |
++ | qianfan_api_key |
+API key for the Qianfan Platform. | +str |
++ + | pp_docbee_base_url |
+Configuration for the multimodal large language model. | +str |
++ + |
device |
+The device used for inference. You can specify a particular card number:
cpu indicates using CPU for inference;gpu:0 indicates using the 1st GPU for inference;npu:0 indicates using the 1st NPU for inference;xpu:0 indicates using the 1st XPU for inference;mlu:0 indicates using the 1st MLU for inference;dcu:0 indicates using the 1st DCU for inference; |
-None |
+str |
++ | ||||
enable_hpi |
+Whether to enable the high-performance inference plugin. | +bool |
+False |
+|||||
use_tensorrt |
+Whether to use TensorRT for inference acceleration. | +bool |
+False |
+|||||
min_subgraph_size |
+Minimum subgraph size for optimizing the computation of model subgraphs. | +int |
+3 |
+|||||
precision |
+Compute precision, such as FP32 or FP16. | +str |
+fp32 |
+|||||
enable_mkldnn |
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. + | +bool |
+True |
+|||||
cpu_threads |
++The number of threads to use when performing inference on the CPU. | +int |
+8 |
+|||||
paddlex_config |
+Path to PaddleX pipeline configuration file. | +str |
+
use_doc_orientation_classify
None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).bool
None
use_doc_unwarping
None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).bool
None
use_textline_orientation
None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).bool
None
use_seal_recognition
None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).bool
None
use_table_recognition
None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).None
, the value initialized by the pipeline for this parameter will be used by default (initialized to True
).bool
None
layout_threshold
0-1
;{0:0.1}
where key is the class ID, and value is the threshold for that class;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 0.5
);0-1
;{0:0.1}
where the key is the class ID and the value is the threshold for that class;None
, uses the pipeline default of 0.5
.float|dict
layout_nms
None
, the parameter will default to the value initialized in the pipeline, which is set to True
by default.bool
None
layout_unclip_ratio
0
;cls_id
, and the value is of tuple type, e.g.,{0: (1.1, 2.0)}
, meaning the center of the detection box for class 0 remains unchanged, width is expanded by 1.1 times, and height by 2.0 times.None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 1.0
);0
;cls_id
, and tuple values, e.g., {0: (1.1, 2.0)}
means width is expanded 1.1× and height 2.0× for class 0 boxes;None
, uses the pipeline default of 1.0
.float|Tuple[float,float]|dict
layout_merge_bboxes_mode
large
,small
, union
, representing whether to keep the large box, small box, or both when filtering overlapping boxes.cls_id
, and the value is of str type, e.g.,{0: "large", 2: "small"}
, meaning use "large" mode for class 0 detection boxes and "small" mode for class 2 detection boxes.None
, the value initialized by the pipeline for this parameter will be used by default (initialized to large
);large
,small
, union
, representing whether to keep the large box, small box, or both when filtering overlapping boxes;cls_id
, and the value is of str type, e.g.,{0: "large", 2: "small"}
, meaning use "large" mode for class 0 detection boxes and "small" mode for class 2 detection boxes;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to large
).str|dict
text_det_limit_side_len
0
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 960
);0
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 960
).int
text_det_limit_type
min
and max
. min
ensures the shortest side of the image is not less than det_limit_side_len
. max
ensures the longest side of the image is not greater than limit_side_len
.None
, the value initialized by the pipeline for this parameter will be used by default (initialized to max
).min
and max
. min
ensures the shortest side of the image is not less than det_limit_side_len
. max
ensures the longest side of the image is not greater than limit_side_len
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to max
).str
text_det_thresh
0
.None
, the value initialized by the pipeline for this parameter (0.3
) will be used by default.0
;None
, the value initialized by the pipeline for this parameter (0.3
) will be used by default.float
None
text_det_box_thresh
0
.None
, the value initialized by the pipeline for this parameter (0.6
) will be used by default.0
;None
, the value initialized by the pipeline for this parameter (0.6
) will be used by default.float
None
text_det_unclip_ratio
0
.None
, the value initialized by the pipeline for this parameter (2.0
) will be used by default.0
;None
, the value initialized by the pipeline for this parameter (2.0
) will be used by default.float
None
text_rec_score_thresh
0
.None
, the value initialized by the pipeline for this parameter (0.0
, i.e., no threshold) will be used by default.0
;None
, the value initialized by the pipeline for this parameter (0.0
, i.e., no threshold) will be used by default.float
None
seal_det_limit_side_len
0
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 736
);0
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to 736
).int
seal_det_limit_type
min
and max
. min
ensures the shortest side of the image is not less than det_limit_side_len
. max
ensures the longest side of the image is not greater than limit_side_len
.None
, the value initialized by the pipeline for this parameter will be used by default (initialized to min
);min
and max
. min
ensures the shortest side of the image is not less than det_limit_side_len
. max
ensures the longest side of the image is not greater than limit_side_len
;None
, the value initialized by the pipeline for this parameter will be used by default (initialized to min
).str
seal_det_thresh
0
.
- None
, the value initialized by the pipeline for this parameter (0.2
) will be used by default.0
;
+ None
, the value initialized by the pipeline for this parameter (0.2
) will be used by default.float
None
seal_det_box_thresh
0
.
- None
, the value initialized by the pipeline for this parameter (0.6
) will be used by default.0
;
+ None
, the value initialized by the pipeline for this parameter (0.6
) will be used by default.float
None
seal_det_unclip_ratio
0
.
- None
, the value initialized by the pipeline for this parameter (0.5
) will be used by default.0
;
+ None
, the value initialized by the pipeline for this parameter (0.5
) will be used by default.float
None
seal_rec_score_thresh
0
.
- None
, the value initialized by the pipeline for this parameter (0.0
, i.e., no threshold) will be used by default.0
;
+ None
, the value initialized by the pipeline for this parameter (0.0
, i.e., no threshold) will be used by default.float
None
None
input
numpy.ndarray
/root/data/img.jpg
;URL link, e.g., network URL of an image file or PDF file: Example;Local directory, which must contain images to be predicted, e.g., local path: /root/data/
(Currently, prediction from directories containing PDF files is not supported; PDF files need to be specified by their full path)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
None
save_path
None
, inference results will not be saved locally.str
None
device
cpu
indicates using CPU for inference;gpu:0
indicates using the 1st GPU for inference;npu:0
indicates using the 1st NPU for inference;xpu:0
indicates using the 1st XPU for inference;mlu:0
indicates using the 1st MLU for inference;dcu:0
indicates using the 1st DCU for inference;None
, the value initialized by the pipeline for this parameter will be used by default. During initialization, it will prioritize using the local GPU 0 device; if not available, it will use the CPU device;cpu
indicates using CPU for inference;gpu:0
indicates using the 1st GPU for inference;npu:0
indicates using the 1st NPU for inference;xpu:0
indicates using the 1st XPU for inference;mlu:0
indicates using the 1st MLU for inference;dcu:0
indicates using the 1st DCU for inference;None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.str
precision
str
fp32
"fp32"
enable_mkldnn
None
, it will be enabled by default.
+bool
None
True
cpu_threads
input
numpy.ndarray
/root/data/img.jpg
;URL link, e.g., network URL of an image file or PDF file: Example;Local directory, which must contain images to be predicted, e.g., local path: /root/data/
(Currently, prediction from directories containing PDF files is not supported; PDF files need to be specified by their full path)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
;/root/data/img.jpg
; URL link, e.g., network URL of an image file or PDF file: Example; Local directory, which must contain images to be predicted, e.g., local path: /root/data/
(Currently, prediction from directories containing PDF files is not supported; PDF files need to be specified by their full path);[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
.Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
use_doc_unwarping
bool
None
format_json
bool
JSON
indentationJSON
indentation.True
indent
int
JSON
data for better readability, effective only when format_json
is True
JSON
data for better readability, effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. Set to True
to escape all non-ASCII
characters; False
to preserve original characters, effective only when format_json
is True
ASCII
characters to Unicode
. Set to True
to escape all non-ASCII
characters; False
to preserve original characters, effective only when format_json
is True
.False
indent
int
JSON
data for better readability, effective only when format_json
is True
JSON
data for better readability, effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. Set to True
to escape all non-ASCII
characters; False
to preserve original characters, effective only when format_json
is True
ASCII
characters to Unicode
. Set to True
to escape all non-ASCII
characters; False
to preserve original characters, effective only when format_json
is True
.False
save_to_img()
save_path
str
save_to_html()
save_path
str
save_to_xlsx()
save_path
str
input
numpy.ndarray
/root/data/img.jpg
;URL link, e.g., network URL of an image file or single-page PDF file: Example;numpy.ndarray
; /root/data/img.jpg
;URL link, e.g., network URL of an image file or single-page PDF file: Example.Python Var|str
/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
Python Var|str|list
str
save_path
None
, 推理结果将不会保存到本地。str
None
invoke_mllm
False
。bool
False
layout_detection_model_name
None
,将会使用产线默认模型。str
None
layout_detection_model_dir
None
,将会下载官方模型。str
None
doc_orientation_classify_model_name
None
,将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
,将会下载官方模型。str
None
doc_unwarping_model_name
None
,将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
,将会下载官方模型。str
None
text_detection_model_name
None
,将会使用产线默认模型。str
None
text_detection_model_dir
None
,将会下载官方模型。str
None
text_recognition_model_name
None
,将会使用产线默认模型。str
None
text_recognition_model_dir
None
,将会下载官方模型。str
None
text_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
table_structure_recognition_model_name
None
,将会使用产线默认模型。str
None
table_structure_recognition_model_dir
None
,将会下载官方模型。str
None
seal_text_detection_model_name
None
,将会使用产线默认模型。str
None
seal_text_detection_model_dir
None
,将会下载官方模型。str
None
seal_text_recognition_model_name
None
,将会使用产线默认模型。str
None
seal_text_recognition_model_dir
None
,将会下载官方模型。str
None
seal_text_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
use_doc_orientation_classify
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_doc_unwarping
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_textline_orientation
True
。bool
use_seal_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_table_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
layout_threshold
0-1
之间的任意浮点数;{0:0.1}
key为类别ID,value为该类别的阈值;None
, 将默认使用产线初始化的该参数值,初始化为 0.5
;0-1
之间的任意浮点数。如果不设置,将默认使用产线初始化的该参数值,初始化为 0.5
。
float|dict
None
float
layout_nms
True
。bool
None
layout_unclip_ratio
0
浮点数;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍None
, 将默认使用产线初始化的该参数值,初始化为 1.0
;0
浮点数。如果不设置,将默认使用产线初始化的该参数值,初始化为 1.0
。
float|Tuple[float,float]|dict
None
float
layout_merge_bboxes_mode
large
,small
, union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留cls_id
, value为str类型, 如{0: "large", 2: "small"}
, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式None
, 将默认使用产线初始化的该参数值,初始化为 large
;large
。
str|dict
None
str
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 960
。
int
None
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
。min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
+如果不设置,将默认使用产线初始化的该参数值,初始化为 max
。
str
None
text_det_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.3
。
float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
text_det_unclip_ratio
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 2.0
。
float
None
text_rec_score_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
seal_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 736
。
int
None
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 min
。
str
None
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.2
。
float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.5
。
float
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
qianfan_api_key
str
None
pp_docbee_base_url
bool
False
str
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;str
None
enable_hpi
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
paddlex_config
str
None
use_doc_orientation_classify
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_doc_unwarping
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_textline_orientation
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_seal_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_table_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
0-1
之间的任意浮点数;{0:0.1}
key为类别ID,value为该类别的阈值;None
, 将默认使用产线初始化的该参数值,初始化为 0.5
;None
,将默认使用产线初始化的该参数值,初始化为 0.5
。float|dict
layout_nms
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
0
浮点数;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍None
, 将默认使用产线初始化的该参数值,初始化为 1.0
;cls_id
,value为tuple类型,如{0: (1.1,2.0)}
,表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍None
,将默认使用产线初始化的该参数值,初始化为 1.0
。float|Tuple[float,float]|dict
layout_merge_bboxes_mode
large
,small
, union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留cls_id
, value为str类型, 如{0: "large", 2: "small"}
, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式None
, 将默认使用产线初始化的该参数值,初始化为 large
;large
,small
,union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留;cls_id
,value为str类型,如{0: "large",2: "small"}
,表示对第0类别检测框使用large模式,对第2类别检测框使用small模式;None
,将默认使用产线初始化的该参数值,初始化为 large
。str|dict
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;None
,将默认使用产线初始化的该参数值,初始化为 960
。int
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
。min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 max
。str
text_det_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.3
。float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.6
。float
None
text_det_unclip_ratio
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 2.0
;float
None
text_rec_score_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
seal_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;0
的任意整数;None
,将默认使用产线初始化的该参数值,初始化为 736
。int
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 min
。str
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.2
。float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.6
。float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.5
。float
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
retriever_config
{
"module_name": "retriever",
"model_name": "embedding-v1",
@@ -1586,7 +1546,7 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
mllm_chat_bot_config
{
"module_name": "chat_bot",
"model_name": "PP-DocBee",
@@ -1600,7 +1560,7 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
chat_bot_config
{
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
@@ -1613,26 +1573,8 @@ PP-ChatOCRv4 预测的流程、API说明、产出说明如下:
None
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
Python Var|str|list
None
save_path
None
, 推理结果将不会保存到本地。str
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;None
,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。str
precision
str
fp32
"fp32"
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据;/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
。Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
use_textline_orientation
bool
None
format_json
bool
JSON
缩进格式化JSON
缩进格式化。True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_to_img()
save_path
str
save_to_html()
save_path
str
save_to_xlsx()
save_path
str
json
json
格式的结果json
格式的结果。img
dict
的可视化图像dict
的可视化图像。build_vector()
方法,对文本内容进行向量构建。visual_info
list|dict
None
min_characters
int
3500
block_size
int
300
flag_save_bytes_vector
bool
False
None
input
Python Var|str
key_list
Union[str, List[str]]
None
key_list
Union[str, List[str]]
None
visual_info
List[dict]
None
use_vector_retrieval
bool
True
vector_info
dict
None
text_task_description
str
None
text_output_format
str
None
text_rules_str
str
None
text_few_shot_demo_text_content
str
None
text_few_shot_demo_key_value_list
str
None
table_task_description
str
None
table_output_format
str
None
table_rules_str
str
None
table_few_shot_demo_text_content
str
None
table_few_shot_demo_key_value_list
str
None
mllm_predict_info
dict
None
@@ -2244,7 +2180,7 @@ for res in visual_predict_res:
None
mllm_integration_strategy
str
"integration"
input |
-Data to be predicted. Required. Supports multiple input types.
-
| Data to be predicted. Required.
+.e.g., local path to image or PDF file: /root/data/img.jpg ; URL, e.g., online image or PDF: example; local directory: directory containing images to predict, e.g., /root/data/ (currently, directories with PDFs are not supported; PDFs must be specified by file path).
|
-Python Var|str|list |
+str |
||
save_path |
-Path to save inference results. If set to None , results will not be saved locally. |
+Path to save inference results. If not set, results will not be saved locally. | str |
-None |
+||
layout_detection_model_name |
-Name of the layout detection model. If set to None , the default model will be used. |
+Name of the layout detection model. If not set, the default model will be used. | str |
-None |
+||
layout_detection_model_dir |
-Directory path of the layout detection model. If set to None , the official model will be downloaded. |
+Directory path of the layout detection model. If not set, the official model will be downloaded. | str |
-None |
+||
layout_threshold |
-Score threshold for the layout model.
-
| Score threshold for the layout model. Any value between 0-1 . If not set, the default value is used, which is 0.5 .
|
-float|dict |
-None |
+float |
+|
layout_nms |
-Whether to apply NMS post-processing for layout detection model. | +Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection. If not set, the parameter will default to the value initialized in the pipeline, which is set to True by default. |
bool |
-None |
+||
layout_unclip_ratio |
-Unclip ratio for detected boxes in layout detection model.
-
|
-float|Tuple[float,float]|dict |
-None |
+Unclip ratio for detected boxes in layout detection model. Any float > 0 . If not set, the default is 1.0 .
+ | float |
+|
layout_merge_bboxes_mode |
-Merge mode for overlapping boxes in layout detection. + | The merging mode for the detection boxes output by the model in layout region detection.
large .
|
-str|dict |
-None |
+str |
+|
chart_recognition_model_name |
-Name of the chart recognition model. If set to None , the default model will be used. |
+Name of the chart recognition model. If not set, the default model will be used. | str |
-None |
+||
chart_recognition_model_dir |
-Directory path of the chart recognition model. If set to None , the official model will be downloaded. |
+Directory path of the chart recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
chart_recognition_batch_size |
-Batch size for the chart recognition model. If set to None , the default batch size is 1 . |
+Batch size for the chart recognition model. If not set, the default batch size is 1 . |
int |
-None |
+||
region_detection_model_name |
-Name of the region detection model. If set to None , the default model will be used. |
+Name of the region detection model. If not set, the default model will be used. | str |
-None |
+||
region_detection_model_dir |
-Directory path of the region detection model. If set to None , the official model will be downloaded. |
+Directory path of the region detection model. If not set, the official model will be downloaded. | str |
-None |
+||
doc_orientation_classify_model_name |
-Name of the document orientation classification model. If set to None , the default model will be used. |
+Name of the document orientation classification model. If not set, the default model will be used. | str |
-None |
+||
doc_orientation_classify_model_dir |
-Directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
+Directory path of the document orientation classification model. If not set, the official model will be downloaded. | str |
-None |
+||
doc_unwarping_model_name |
-Name of the document unwarping model. If set to None , the default model will be used. |
+Name of the document unwarping model. If not set, the default model will be used. | str |
-None |
+||
doc_unwarping_model_dir |
-Directory path of the document unwarping model. If set to None , the official model will be downloaded. |
+Directory path of the document unwarping model. If not set, the official model will be downloaded. | str |
-None |
+||
text_detection_model_name |
-Name of the text detection model. If set to None , the default model will be used. |
+Name of the text detection model. If not set, the default model will be used. | str |
-None |
+||
text_detection_model_dir |
-Directory path of the text detection model. If set to None , the official model will be downloaded. |
+Directory path of the text detection model. If not set, the official model will be downloaded. | str |
-None |
+||
text_det_limit_side_len |
-Maximum side length limit for text detection.
-
| Image side length limitation for text detection. Any integer > 0 . If not set, the default value will be 960 .
|
int |
-None |
+||
text_det_limit_type |
-
-
| Type of the image side length limit for text detection.
+supports min and max ; min means ensuring the shortest side of the image is not less than det_limit_side_len , max means the longest side does not exceed limit_side_len . If not set, the default value will be max .
|
str |
-None |
+||
text_det_thresh |
-Pixel threshold for detection. Pixels with scores above this value in the probability map are considered text.
-
| Pixel threshold for detection. Pixels with scores above this value in the probability map are considered text.Any float > 0
+. If not set, the default is 0.3 .
|
float |
-None |
+||
text_det_box_thresh |
Box threshold. A bounding box is considered text if the average score of pixels inside is greater than this value.
-
0 . If not set, the default is 0.6 .
|
float |
-None |
+|||
text_det_unclip_ratio |
Expansion ratio for text detection. The higher the value, the larger the expansion area.
-
0 . If not set, the default is 2.0 .
|
float |
-None |
+|||
textline_orientation_model_name |
-Name of the text line orientation model. If set to None , the default model will be used. |
+Name of the text line orientation model. If not set, the default model will be used. | str |
-None |
+||
textline_orientation_model_dir |
-Directory of the text line orientation model. If set to None , the official model will be downloaded. |
+Directory of the text line orientation model. If not set, the official model will be downloaded. | str |
-None |
+||
textline_orientation_batch_size |
-Batch size for the text line orientation model. If set to None , default is 1 . |
+Batch size for the text line orientation model. If not set, the default is 1 . |
int |
-None |
+||
text_recognition_model_name |
-Name of the text recognition model. If set to None , the default model will be used. |
+Name of the text recognition model. If not set, the default model will be used. | str |
-None |
+||
text_recognition_model_dir |
-Directory of the text recognition model. If set to None , the official model will be downloaded. |
+Directory of the text recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
text_recognition_batch_size |
-Batch size for text recognition. If set to None , default is 1 . |
+Batch size for text recognition. If not set, the default is 1 . |
int |
-None |
+||
text_rec_score_thresh |
Score threshold for text recognition. Only results above this value will be kept.
-
0 . If not set, the default is 0.0 (no threshold).
|
float |
-None |
+|||
table_classification_model_name |
-Name of the table classification model. If set to None , the default model will be used. |
+Name of the table classification model. If not set, the default model will be used. | str |
-None |
+||
table_classification_model_dir |
-Directory of the table classification model. If set to None , the official model will be downloaded. |
+Directory of the table classification model. If not set, the official model will be downloaded. | str |
-None |
+||
wired_table_structure_recognition_model_name |
-Name of the wired table structure recognition model. If set to None , the default model will be used. |
+Name of the wired table structure recognition model. If not set, the default model will be used. | str |
-None |
+||
wired_table_structure_recognition_model_dir |
-Directory of the wired table structure recognition model. If set to None , the official model will be downloaded. |
+Directory of the wired table structure recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
wireless_table_structure_recognition_model_name |
-Name of the wireless table structure recognition model. If set to None , the default model will be used. |
+Name of the wireless table structure recognition model. If not set, the default model will be used. | str |
-None |
+||
wireless_table_structure_recognition_model_dir |
-Directory of the wireless table structure recognition model. If set to None , the official model will be downloaded. |
+Directory of the wireless table structure recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
wired_table_cells_detection_model_name |
-Name of the wired table cell detection model. If set to None , the default model will be used. |
+Name of the wired table cell detection model. If not set, the default model will be used. | str |
-None |
+||
wired_table_cells_detection_model_dir |
-Directory of the wired table cell detection model. If set to None , the official model will be downloaded. |
+Directory of the wired table cell detection model. If not set, the official model will be downloaded. | str |
-None |
+||
wireless_table_cells_detection_model_name |
-Name of the wireless table cell detection model. If set to None , the default model will be used. |
+Name of the wireless table cell detection model. If not set, the default model will be used. | str |
-None |
+||
wireless_table_cells_detection_model_dir |
-Directory of the wireless table cell detection model. If set to None , the official model will be downloaded. |
+Directory of the wireless table cell detection model. If not set, the official model will be downloaded. | str |
-None |
+||
seal_text_detection_model_name |
-Name of the seal text detection model. If set to None , the default model will be used. |
+Name of the seal text detection model. If not set, the default model will be used. | str |
-None |
+||
seal_text_detection_model_dir |
-Directory of the seal text detection model. If set to None , the official model will be downloaded. |
+Directory of the seal text detection model. If not set, the official model will be downloaded. | str |
-None |
+||
seal_det_limit_side_len |
Image side length limit for seal text detection.
-
0 . If not set, the default is 736 .
|
int |
-None |
+|||
seal_det_limit_type |
Limit type for image side in seal text detection.
-
min and max ; min ensures shortest side ≥ det_limit_side_len , max ensures longest side ≤ limit_side_len . If not set, the default is min .
|
str |
-None |
+|||
seal_det_thresh |
Pixel threshold. Pixels with scores above this value in the probability map are considered text.
-
0 . If not set, the default is 0.2 .
|
float |
-None |
+|||
seal_det_box_thresh |
Box threshold. Boxes with average pixel scores above this value are considered text regions.
-
0 . If not set, the default is 0.6 .
|
float |
-None |
+|||
seal_det_unclip_ratio |
-Expansion ratio for seal text detection. Higher value means larger expansion area.
-
| Expansion ratio for seal text detection. Higher value means larger expansion area.Any float > 0 . If not set, the default is 0.5 .
|
float |
-None |
+||
seal_text_recognition_model_name |
-Name of the seal text recognition model. If set to None , the default model will be used. |
+Name of the seal text recognition model. If not set, the default model will be used. | str |
-None |
+||
seal_text_recognition_model_dir |
-Directory of the seal text recognition model. If set to None , the official model will be downloaded. |
+Directory of the seal text recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
seal_text_recognition_batch_size |
-Batch size for seal text recognition. If set to None , default is 1 . |
+Batch size for seal text recognition. If not set, the default is 1 . |
int |
-None |
+||
seal_rec_score_thresh |
-Recognition score threshold. Text results above this value will be kept.
-
| Recognition score threshold. Text results above this value will be kept. Any float > 0 . If not set, the default is 0.0 (no threshold).
|
float |
-None |
+||
formula_recognition_model_name |
-Name of the formula recognition model. If set to None , the default model will be used. |
+Name of the formula recognition model. If not set, the default model will be used. | str |
-None |
+||
formula_recognition_model_dir |
-Directory of the formula recognition model. If set to None , the official model will be downloaded. |
+Directory of the formula recognition model. If not set, the official model will be downloaded. | str |
-None |
+||
formula_recognition_batch_size |
-Batch size of the formula recognition model. If set to None , the default is 1 . |
+Batch size of the formula recognition model. If not set, the default is 1 . |
int |
-None |
+||
use_doc_orientation_classify |
-Whether to enable document orientation classification. If set to None , default is True . |
+Whether to load and use document orientation classification module. If not set, the default is True . |
bool |
-None |
+||
use_doc_unwarping |
-Whether to enable document unwarping. If set to None , default is True . |
+Whether to load and use document unwarping module. If not set, the default is True . |
bool |
-None |
++ | |
use_textline_orientation |
+Whether to load and use the text line orientation classification module. If not set, the default is True . |
+bool |
+||||
use_seal_recognition |
-Whether to enable seal recognition subpipeline. If set to None , default is True . |
+Whether to load and use seal recognition subpipeline. If not set, the default is True . |
bool |
-None |
+||
use_table_recognition |
-Whether to enable table recognition subpipeline. If set to None , default is True . |
+Whether to load and use table recognition subpipeline. If not set, the default is True . |
bool |
-None |
+||
use_formula_recognition |
-Whether to enable formula recognition subpipeline. If set to None , default is True . |
+Whether to load and use formula recognition subpipeline. If not set, the default is True . |
bool |
-None |
+||
use_chart_recognition |
-Whether to enable chart recognition model. If set to None , default is True . |
+Whether to load and use the chart recognition sub-pipeline. If not set, the default is True . |
bool |
-None |
+||
use_region_detection |
-Whether to enable region detection submodule for document images. If set to None , default is True . |
+Whether to load and use the document region detection pipeline. If not set, the default is True . |
bool |
-None |
+||
device |
-Device for inference. You can specify a device ID. + | Device for inference. You can specify a device ID:
|
str |
-None |
+||
enable_hpi |
@@ -1461,9 +1412,9 @@ paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu
||||||
enable_mkldnn |
-Whether to enable MKL-DNN. If set to None , enabled by default. |
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | bool |
-None |
+True |
|
cpu_threads |
@@ -1475,7 +1426,7 @@ paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu
paddlex_config |
Path to the PaddleX pipeline configuration file. | str |
-None |
+
0-1
;{0:0.1}
where the key is the class ID and the value is the threshold for that class;None
, uses the pipeline default of 0.5
;None
, uses the pipeline default of 0.5
.float|dict
layout_nms
None
, the parameter will default to the value initialized in the pipeline, which is set to True
by default.bool
None
0
;cls_id
, and tuple values, e.g., {0: (1.1, 2.0)}
means width is expanded 1.1× and height 2.0× for class 0 boxes;None
, uses the pipeline default of 1.0
;None
, uses the pipeline default of 1.0
.float|Tuple[float,float]|dict
large
, small
, and union
to retain the larger box, smaller box, or both;cls_id
, and str values, e.g., {0: "large", 2: "small"}
means using different modes for different classes;None
, uses the pipeline default value large
;None
, uses the pipeline default value large
.str|dict
text_det_limit_side_len
0
;None
, uses the pipeline default of 960
;None
, uses the pipeline default of 960
.int
min
and max
. min
ensures the shortest side is no less than det_limit_side_len
, while max
ensures the longest side is no greater than limit_side_len
;None
, uses the pipeline default of max
;None
, uses the pipeline default of max
.str
0
;None
, uses the pipeline default value of 0.3
;None
, uses the pipeline default value of 0.3
.float
0
;None
, uses the pipeline default value of 0.6
;None
, uses the pipeline default value of 0.6
.float
0
;None
, uses the pipeline default value of 2.0
;None
, uses the pipeline default value of 2.0
.float
0
;None
, uses the pipeline default of 0.0
(no threshold);None
, uses the pipeline default of 0.0
(no threshold).float
0
;None
, the default value is 736
;None
, the default value is 736
.int
min
and max
. min
ensures the shortest side is no less than det_limit_side_len
, while max
ensures the longest side is no greater than limit_side_len
;None
, the default value is min
;None
, the default value is min
.str
0
;None
, the default value is 0.2
;None
, the default value is 0.2
.float
0
;None
, the default value is 0.6
;None
, the default value is 0.6
.float
0
;None
, the default value is 0.5
;None
, the default value is 0.5
.float
0
;None
, the default value is 0.0
(no threshold);None
, the default value is 0.0
(no threshold).float
None
use_textline_orientation
True
.bool
use_seal_recognition
True
.bool
use_table_recognition
True
.bool
use_formula_recognition
True
.bool
use_chart_recognition
None
, the default value is True
.None
, the default value is True
.bool
None
use_region_detection
None
, the default value is True
.None
, the default value is True
.bool
None
device
cpu
means using CPU for inference;gpu:0
means using GPU 0;xpu:0
means using XPU 0;mlu:0
means using MLU 0;dcu:0
means using DCU 0;None
, GPU 0 will be used by default. If GPU is not available, CPU will be used;None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.str
precision
str
fp32
"fp32"
enable_mkldnn
None
, MKL-DNN is enabled by default.bool
None
True
cpu_threads
input
numpy.ndarray
/root/data/img.jpg
; URL to image or PDF, e.g., example; directory containing image files, e.g., /root/data/
(directories with PDFs are not supported, use full file path for PDFs)[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
numpy.ndarray
;/root/data/img.jpg
; URL to image or PDF, e.g., example; directory containing image files, e.g., /root/data/
(directories with PDFs are not supported, use full file path for PDFs);[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"].
Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
None
use_chart_recognition
None
, the default value is True
.bool
None
use_region_detection
None
, the default value is True
.bool
None
layout_threshold
float|dict
float
None
use_wired_table_cells_trans_to_html
True
or False
;None
, it will default to the initialized parameter value, initialized as False
;bool|None
False
use_wireless_table_cells_trans_to_html
True
or False
;None
, it will default to the initialized parameter value, initialized as False
;bool|None
False
use_table_orientation_classify
True
or False
;None
, it will default to the initialized parameter value, initialized as True
;bool|None
True
use_ocr_results_with_table_cells
True
or False
;None
, it will default to the initialized parameter value, initialized as True
;bool|None
True
use_e2e_wired_table_rec_model
True
or False
;None
, it will default to the initialized parameter value, initialized as False
;bool|None
False
use_e2e_wireless_table_rec_model
True
or False
;None
, it will default to the initialized parameter value, initialized as False
;bool|None
True
format_json
bool
JSON
JSON
.True
indent
int
JSON
output. Only effective when format_json=True
JSON
output. Only effective when format_json=True
.ensure_ascii
bool
ASCII
characters to Unicode
. When True
, all non-ASCII characters are escaped. When False
, original characters are retained. Only effective when format_json=True
ASCII
characters to Unicode
. When True
, all non-ASCII characters are escaped. When False
, original characters are retained. Only effective when format_json=True
.False
save_path
str
indent
int
JSON
output. Only effective when format_json=True
JSON
output. Only effective when format_json=True
.ensure_ascii
bool
ASCII
characters to Unicode
. Only effective when format_json=True
ASCII
characters to Unicode
. Only effective when format_json=True
.False
save_path
str
save_path
str
save_path
str
save_path
str
markdown_list
list
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
Python Var|str|list
str
save_path
None
, 推理结果将不会保存到本地。str
None
layout_detection_model_name
None
,将会使用产线默认模型。str
None
layout_detection_model_dir
None
,将会下载官方模型。str
None
layout_threshold
0-1
之间的任意浮点数;{0:0.1}
key为类别ID,value为该类别的阈值;None
, 将默认使用产线初始化的该参数值,初始化为 0.5
;0-1
之间的任意浮点数。如果不设置,将默认使用产线初始化的该参数值,初始化为 0.5
。
float|dict
None
float
layout_nms
True
。bool
None
layout_unclip_ratio
0
浮点数;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍None
, 将默认使用产线初始化的该参数值,初始化为 1.0
;0
浮点数。如果不设置,将默认使用产线初始化的该参数值,初始化为 1.0
。
float|Tuple[float,float]|dict
None
float
layout_merge_bboxes_mode
large
,small
, union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留cls_id
, value为str类型, 如{0: "large", 2: "small"}
, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式None
, 将默认使用产线初始化的该参数值,初始化为 large
;large
。
str|dict
None
str
chart_recognition_model_name
None
,将会使用产线默认模型。str
None
chart_recognition_model_dir
None
,将会下载官方模型。str
None
chart_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
region_detection_model_name
None
,将会使用产线默认模型。str
None
region_detection_model_dir
None
,将会下载官方模型。str
None
doc_orientation_classify_model_name
None
,将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
,将会下载官方模型。str
None
doc_unwarping_model_name
None
,将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
,将会下载官方模型。str
None
text_detection_model_name
None
,将会使用产线默认模型。str
None
text_detection_model_dir
None
,将会下载官方模型。str
None
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 960
。
int
None
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
。min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 max
。
str
None
text_det_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.3
。
float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
text_det_unclip_ratio
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 2.0
。
float
None
textline_orientation_model_name
None
,将会使用产线默认模型。str
None
textline_orientation_model_dir
None
,将会下载官方模型。str
None
textline_orientation_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
text_recognition_model_name
None
,将会使用产线默认模型。str
None
text_recognition_model_dir
None
,将会下载官方模型。str
None
text_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
text_rec_score_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
table_classification_model_name
None
,将会使用产线默认模型。str
None
table_classification_model_dir
None
,将会下载官方模型。str
None
wired_table_structure_recognition_model_name
None
,将会使用产线默认模型。str
None
wired_table_structure_recognition_model_dir
None
,将会下载官方模型。str
None
wireless_table_structure_recognition_model_name
None
,将会使用产线默认模型。str
None
wireless_table_structure_recognition_model_dir
None
,将会下载官方模型。str
None
wired_table_cells_detection_model_name
None
,将会使用产线默认模型。str
None
wired_table_cells_detection_model_dir
None
,将会下载官方模型。str
None
wireless_table_cells_detection_model_name
None
,将会使用产线默认模型。str
None
wireless_table_cells_detection_model_dir
None
,将会下载官方模型。str
None
seal_text_detection_model_name
None
,将会使用产线默认模型。str
None
seal_text_detection_model_dir
None
,将会下载官方模型。str
None
seal_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 736
。
int
None
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 min
。
str
None
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.2
。
float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.5
。
float
None
seal_text_recognition_model_name
None
,将会使用产线默认模型。str
None
seal_text_recognition_model_dir
None
,将会下载官方模型。str
None
seal_text_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
formula_recognition_model_name
None
,将会使用产线默认模型。str
None
formula_recognition_model_dir
None
,将会下载官方模型。str
None
formula_recognition_batch_size
None
,将默认设置批处理大小为1
。1
。int
None
use_doc_orientation_classify
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_doc_unwarping
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_textline_orientation
True
。bool
use_seal_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_table_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_formula_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_chart_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_region_detection
None
,将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;str
None
enable_hpi
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
paddlex_config
str
None
0-1
之间的任意浮点数;{0:0.1}
key为类别ID,value为该类别的阈值;None
, 将默认使用产线初始化的该参数值,初始化为 0.5
;None
,将默认使用产线初始化的该参数值,初始化为 0.5
。float|dict
layout_nms
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
0
浮点数;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
, 表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍None
, 将默认使用产线初始化的该参数值,初始化为 1.0
;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
,表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍;None
,将默认使用产线初始化的该参数值,初始化为 1.0
。float|Tuple[float,float]|dict
layout_merge_bboxes_mode
large
,small
, union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留cls_id
, value为str类型, 如{0: "large", 2: "small"}
, 表示对第0类别检测框使用large模式,对第2类别检测框使用small模式None
, 将默认使用产线初始化的该参数值,初始化为 large
;large
,small
,union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留;cls_id
,value为str类型,如{0: "large", 2: "small"}
,表示对第0类别检测框使用large模式,对第2类别检测框使用small模式;None
,将默认使用产线初始化的该参数值,初始化为 large
。str|dict
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;None
,将默认使用产线初始化的该参数值,初始化为 960
。int
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
。min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 max
。str
text_det_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.3
。float
None
text_det_box_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.6
。float
None
text_det_unclip_ratio
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 2.0
。float
None
text_rec_score_thresh
0
的任意浮点数
-None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;None
,将默认使用产线初始化的该参数值,初始化为 736
。int
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 min
。str
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.2
。float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.6
。float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.5
。float
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
use_doc_orientation_classify
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_doc_unwarping
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_textline_orientation
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_seal_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_table_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_formula_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_chart_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_region_detection
None
,将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;None
,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。str
precision
str
fp32
"fp32"
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]。
Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
None
use_chart_recognition
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_region_detection
None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
layout_threshold
float|dict
float
None
use_wired_table_cells_trans_to_html
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为False
。float|None
False
use_wireless_table_cells_trans_to_html
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为False
。float|None
False
use_table_orientation_classify
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为True
。bool|None
True
use_ocr_results_with_table_cells
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为True
。bool|None
True
use_e2e_wired_table_rec_model
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为False
。bool|None
False
use_e2e_wireless_table_rec_model
True
或者 False
;None
,将默认使用产线初始化的该参数值,初始化为False
。bool|None
True
format_json
bool
JSON
缩进格式化JSON
缩进格式化。True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
save_to_markdown()
save_path
str
save_path
str
save_path
str
markdown_list
list
input
numpy.ndarray
./root/data/img.jpg
; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
./root/data/img.jpg
; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).
Python Var|str|list
str
save_path
None
, the inference result will not be saved locally.str
None
doc_orientation_classify_model_name
None
, the pipeline's default model will be used.str
None
doc_orientation_classify_model_dir
None
, the official model will be downloaded.str
None
doc_unwarping_model_name
None
, the pipeline's default model will be used.str
None
doc_unwarping_model_dir
None
, the official model will be downloaded.str
None
use_doc_orientation_classify
None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.True
.bool
None
use_doc_unwarping
None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.True
.bool
None
device
cpu
indicates using the CPU for inference.gpu:0
indicates using the first GPU for inference.npu:0
indicates using the first NPU for inference.xpu:0
indicates using the first XPU for inference.mlu:0
indicates using the first MLU for inference.dcu:0
indicates using the first DCU for inference.None
, the parameter value initialized by the pipeline will be used by default. During initialization, the local GPU 0 device will be prioritized; if not available, the CPU device will be used.cpu
indicates using the CPU for inference;gpu:0
indicates using the first GPU for inference;npu:0
indicates using the first NPU for inference;xpu:0
indicates using the first XPU for inference;mlu:0
indicates using the first MLU for inference;dcu:0
indicates using the first DCU for inference;str
None
enable_hpi
enable_mkldnn
None
, it will be enabled by default.bool
None
True
cpu_threads
paddlex_config
str
None
use_doc_orientation_classify
None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.bool
None
use_doc_unwarping
None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.None
, the parameter value initialized by the pipeline will be used by default, initialized as True
.bool
None
device
cpu
indicates using the CPU for inference.gpu:0
indicates using the first GPU for inference.npu:0
indicates using the first NPU for inference.xpu:0
indicates using the first XPU for inference.mlu:0
indicates using the first MLU for inference.dcu:0
indicates using the first DCU for inference.None
, the parameter value initialized by the pipeline will be used by default. During initialization, the local GPU 0 device will be prioritized; if not available, the CPU device will be used.cpu
indicates using the CPU for inference;gpu:0
indicates using the first GPU for inference;npu:0
indicates using the first NPU for inference;xpu:0
indicates using the first XPU for inference;mlu:0
indicates using the first MLU for inference;dcu:0
indicates using the first DCU for inference;None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.str
precision
str
fp32
"fp32"
enable_mkldnn
None
, it will be enabled by default.bool
None
True
cpu_threads
input
numpy.ndarray
./root/data/img.jpg
; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path).numpy.ndarray
;/root/data/img.jpg
; or a URL link, such as the network URL of an image file or PDF file: example; or a local directory, which should contain the images to be predicted, such as the local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files need to be specified to a specific file path);[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
.device
str
None
use_doc_orientation_classify
bool
file
string
Serving:
extra:
max_num_input_imgs: null
@@ -674,13 +664,13 @@ Below are the API references for basic service-oriented deployment and examples
useDocOrientationClassify
boolean
| null
-Please refer to the description of the use_doc_orientation_classify
parameter in the predict
method of the production line object.
+Please refer to the description of the use_doc_orientation_classify
parameter in the predict
method of the pipeline object.
No
useDocUnwarping
boolean
| null
-Please refer to the description of the use_doc_unwarping
parameter in the predict
method of the production line object.
+Please refer to the description of the use_doc_unwarping
parameter in the predict
method of the pipeline object.
No
@@ -727,7 +717,7 @@ Below are the API references for basic service-oriented deployment and examples
prunedResult
object
-A simplified version of the res
field in the JSON representation of the result generated by the predict
method of the production line object, with the input_path
and page_index
fields removed.
+A simplified version of the res
field in the JSON representation of the result generated by the predict
method of the pipeline object, with the input_path
and page_index
fields removed.
docPreprocessingImage
diff --git a/docs/version3.x/pipeline_usage/doc_preprocessor.md b/docs/version3.x/pipeline_usage/doc_preprocessor.md
index 5a3c5b03093afdcf1f7e31556a410907d29c1d95..26f802e311b2fddc208231052490592d7a6ad1fa 100644
--- a/docs/version3.x/pipeline_usage/doc_preprocessor.md
+++ b/docs/version3.x/pipeline_usage/doc_preprocessor.md
@@ -152,63 +152,59 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
input
-待预测数据,支持多种输入类型,必填。
-
-- Python Var:如
numpy.ndarray
表示的图像数据
-- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
-
+ 待预测数据,必填。
+如图像文件或者PDF文件的本地路径:/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
-Python Var|str|list
+str
save_path
-指定推理结果文件保存的路径。如果设置为None
, 推理结果将不会保存到本地。
+指定推理结果文件保存的路径。如果不设置,推理结果将不会保存到本地。
str
-None
+
doc_orientation_classify_model_name
-文档方向分类模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档方向分类模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
doc_orientation_classify_model_dir
-文档方向分类模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档方向分类模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
doc_unwarping_model_name
-文本图像矫正模型的名称。如果设置为None
, 将会使用产线默认模型。
+文本图像矫正模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
doc_unwarping_model_dir
-文本图像矫正模型的目录路径。如果设置为None
, 将会下载官方模型。
+文本图像矫正模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
use_doc_orientation_classify
-是否加载文档方向分类模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+ 是否加载并使用文档方向分类模块。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
use_doc_unwarping
-是否加载文本图像矫正模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+ 是否加载并使用文本图像矫正模块。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -216,11 +212,10 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
-
+如果不设置,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
str
-None
+
enable_hpi
@@ -248,10 +243,10 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -263,7 +258,7 @@ paddleocr doc_preprocessor -i ./doc_test_rotated.jpg --device gpu
paddlex_config
PaddleX产线配置文件路径。
str
-None
+
@@ -315,45 +310,45 @@ for res in output:
doc_orientation_classify_model_name
-文档方向分类模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档方向分类模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
doc_orientation_classify_model_dir
-文档方向分类模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档方向分类模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
doc_unwarping_model_name
-文本图像矫正模型的名称。如果设置为None
, 将会使用产线默认模型。
+文本图像矫正模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
doc_unwarping_model_dir
-文本图像矫正模型的目录路径。如果设置为None
, 将会下载官方模型。
+文本图像矫正模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
use_doc_orientation_classify
-是否加载文档方向分类模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+ 是否加载并使用文档方向分类模块。如果设置为None
,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
use_doc_unwarping
-是否加载文本图像矫正模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+ 是否加载并使用文本图像矫正模块。如果设置为None
,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -361,7 +356,7 @@ for res in output:
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
+- None:如果设置为
None
,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
str
@@ -389,14 +384,14 @@ for res in output:
precision
计算精度,如 fp32、fp16。
str
-fp32
+"fp32"
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -432,21 +427,15 @@ for res in output:
input
待预测数据,支持多种输入类型,必填。
-- Python Var:如
numpy.ndarray
表示的图像数据
-- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
+- Python Var:如
numpy.ndarray
表示的图像数据;
+- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);
+- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
。
Python Var|str|list
-device
-与实例化时的参数相同。
-str
-None
-
-
use_doc_orientation_classify
是否在推理时使用文档方向分类模块。
bool
@@ -560,7 +549,7 @@ for res in output:
- `json` 属性获取的预测结果为dict类型的数据,相关内容与调用 `save_to_json()` 方法保存的内容一致。
-- `img` 属性返回的预测结果是一个字典类型的数据。其中,键为 `preprocessed_img`,对应的值是 `Image.Image` 对象:用于显示 doc_preprocessor 结果的可视化图像。
+- `img` 属性返回的预测结果是一个dict类型的数据。其中,键为 `preprocessed_img`,对应的值是 `Image.Image` 对象:用于显示 doc_preprocessor 结果的可视化图像。
## 3. 开发集成/部署
diff --git a/docs/version3.x/pipeline_usage/doc_understanding.en.md b/docs/version3.x/pipeline_usage/doc_understanding.en.md
index 4441b16e1b08b3024737355dddc617d290146dbf..d42902525dd45e05a4af7c67c5ea77ebbbf95346 100644
--- a/docs/version3.x/pipeline_usage/doc_understanding.en.md
+++ b/docs/version3.x/pipeline_usage/doc_understanding.en.md
@@ -1,7 +1,5 @@
---
-
comments: true
-
---
# Document Understanding Pipeline Usage Tutorial
@@ -62,7 +60,7 @@ Before using the document understanding pipeline locally, ensure that you have c
Experience the doc_understanding pipeline with just one command line:
```bash
-paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出'}"
+paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容,以markdown格式输出'}"
```
The command line supports more parameter settings, click to expand for a detailed explanation of the command line parameters
@@ -78,41 +76,39 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
input
-Data to be predicted, supports dictionary type input, required.
-
-- Python Dict: The input format for PP-DocBee is:
{"image":/path/to/image, "query": user question}
, representing the input image and corresponding user question.
-
+ Data to be predicted, required.
+"{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': 'Recognize the content of this table and output it in markdown format'}".
-Python Var|str|list
+str
save_path
-Specify the path for saving the inference result file. If set to None
, the inference result will not be saved locally.
+Specify the path for saving the inference result file. If not set, the inference result will not be saved locally.
str
-None
+
doc_understanding_model_name
-The name of the document understanding model. If set to None
, the default model of the pipeline will be used.
+The name of the document understanding model. If not set, the default model of the pipeline will be used.
str
-None
+
doc_understanding_model_dir
-The directory path of the document understanding model. If set to None
, the official model will be downloaded.
+The directory path of the document understanding model. If not set, the official model will be downloaded.
str
-None
+
doc_understanding_batch_size
-The batch size of the document understanding model. If set to None
, the default batch size will be set to 1
.
+The batch size of the document understanding model. If not set, the default batch size will be set to 1
.
int
-None
+
device
-The device used for inference. Supports specifying a specific card number.
+ The device used for inference. Supports specifying a specific card number:
- CPU: For example,
cpu
indicates using the CPU for inference;
- GPU: For example,
gpu:0
indicates using the first GPU for inference;
@@ -120,11 +116,10 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
- XPU: For example,
xpu:0
indicates using the first XPU for inference;
- MLU: For example,
mlu:0
indicates using the first MLU for inference;
- DCU: For example,
dcu:0
indicates using the first DCU for inference;
-- None: If set to
None
, the initialized value of this parameter will be used by default, which will preferentially use the local GPU device 0, or the CPU device if none is available.
-
+If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
str
-None
+
enable_hpi
@@ -152,9 +147,9 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
enable_mkldnn
-Whether to enable the MKL-DNN acceleration library. If set to None
, it will be enabled by default.
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool
-None
+True
cpu_threads
@@ -166,7 +161,7 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
paddlex_config
Path to PaddleX pipeline configuration file.
str
-None
+
@@ -176,7 +171,7 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
The results will be printed to the terminal, and the default configuration of the doc_understanding pipeline will produce the following output:
```bash
-{'res': {'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国(CHN) | 48 | 22 | 30 | 100 |\n| 2 | 美国(USA) | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯(RUS) | 24 | 13 | 23 | 60 |\n| 4 | 英国(GBR) | 19 | 13 | 19 | 51 |\n| 5 | 德国(GER) | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚(AUS) | 14 | 15 | 17 | 46 |\n| 7 | 韩国(KOR) | 13 | 11 | 8 | 32 |\n| 8 | 日本(JPN) | 9 | 8 | 8 | 25 |\n| 9 | 意大利(ITA) | 8 | 9 | 10 | 27 |\n| 10 | 法国(FRA) | 7 | 16 | 20 | 43 |\n| 11 | 荷兰(NED) | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰(UKR) | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚(KEN) | 6 | 4 | 6 | 16 |\n| 14 | 西班牙(ESP) | 5 | 11 | 3 | 19 |\n| 15 | 牙买加(JAM) | 5 | 4 | 2 | 11 |\n'}}
+{'res': {'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容,以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国(CHN) | 48 | 22 | 30 | 100 |\n| 2 | 美国(USA) | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯(RUS) | 24 | 13 | 23 | 60 |\n| 4 | 英国(GBR) | 19 | 13 | 19 | 51 |\n| 5 | 德国(GER) | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚(AUS) | 14 | 15 | 17 | 46 |\n| 7 | 韩国(KOR) | 13 | 11 | 8 | 32 |\n| 8 | 日本(JPN) | 9 | 8 | 8 | 25 |\n| 9 | 意大利(ITA) | 8 | 9 | 10 | 27 |\n| 10 | 法国(FRA) | 7 | 16 | 20 | 43 |\n| 11 | 荷兰(NED) | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰(UKR) | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚(KEN) | 6 | 4 | 6 | 16 |\n| 14 | 西班牙(ESP) | 5 | 11 | 3 | 19 |\n| 15 | 牙买加(JAM) | 5 | 4 | 2 | 11 |\n'}}
```
### 2.2 Python Script Integration
@@ -190,7 +185,7 @@ pipeline = DocUnderstanding()
output = pipeline.predict(
{
"image": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png",
- "query": "识别这份表格的内容, 以markdown格式输出"
+ "query": "识别这份表格的内容,以markdown格式输出"
}
)
for res in output:
@@ -232,7 +227,7 @@ In the above Python script, the following steps are performed:
device
-The device used for inference. Supports specifying a specific card number.
+ The device used for inference. Supports specifying a specific card number:
- CPU: For example,
cpu
indicates using the CPU for inference;
- GPU: For example,
gpu:0
indicates using the first GPU for inference;
@@ -240,7 +235,7 @@ In the above Python script, the following steps are performed:
- XPU: For example,
xpu:0
indicates using the first XPU for inference;
- MLU: For example,
mlu:0
indicates using the first MLU for inference;
- DCU: For example,
dcu:0
indicates using the first DCU for inference;
-- None: If set to
None
, the initialized value of this parameter will be used by default, which will preferentially use the local GPU device 0, or the CPU device if none is available.
+- None: If set to
None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
str
@@ -268,13 +263,13 @@ In the above Python script, the following steps are performed:
precision
Calculation precision, such as fp32, fp16.
str
-fp32
+"fp32"
enable_mkldnn
-Whether to enable the MKL-DNN acceleration library. If set to None
, it will be enabled by default.
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool
-None
+True
cpu_threads
@@ -316,11 +311,7 @@ Below are the parameters and their descriptions for the `predict()` method:
Python Dict
-
-device
-Same as the parameter during instantiation.
-str
-None
+
(3) Process the prediction results. The prediction result for each sample is a corresponding Result object, which supports printing and saving as a `json` file:
@@ -341,19 +332,19 @@ Below are the parameters and their descriptions for the `predict()` method:
Print the result to the terminal
format_json
bool
-Whether to format the output content using JSON
indentation
+Whether to format the output content using JSON
indentation.
True
indent
int
-Specifies the indentation level to beautify the output JSON
data, making it more readable, effective only when format_json
is True
+Specifies the indentation level to beautify the output JSON
data, making it more readable, effective only when format_json
is True
.
4
ensure_ascii
bool
-Controls whether to escape non-ASCII
characters into Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
+Controls whether to escape non-ASCII
characters into Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
.
False
@@ -367,13 +358,13 @@ Below are the parameters and their descriptions for the `predict()` method:
indent
int
-Specifies the indentation level to beautify the output JSON
data, making it more readable, effective only when format_json
is True
+Specifies the indentation level to beautify the output JSON
data, making it more readable, effective only when format_json
is True
.
4
ensure_ascii
bool
-Controls whether to escape non-ASCII
characters into Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
+Controls whether to escape non-ASCII
characters into Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
.
False
diff --git a/docs/version3.x/pipeline_usage/doc_understanding.md b/docs/version3.x/pipeline_usage/doc_understanding.md
index f58a3998f3a3b00f0010b3a180a6db90fe9cb878..39ae0b9ef841ce89383cbb600e0ddc88df9de8d5 100644
--- a/docs/version3.x/pipeline_usage/doc_understanding.md
+++ b/docs/version3.x/pipeline_usage/doc_understanding.md
@@ -45,7 +45,7 @@ comments: true
-注:以上模型总分为内部评估集模型测试结果,内部评估集所有图像分辨率 (height, width) 为 (1680,1204),共1196条数据,包括了财报、法律法规、理工科论文、说明书、文科论文、合同、研报等场景,暂时未有计划公开。
+注:以上模型总分为内部评估集模型测试结果,内部评估集所有图像分辨率 (height,width) 为 (1680,1204),共1196条数据,包括了财报、法律法规、理工科论文、说明书、文科论文、合同、研报等场景,暂时未有计划公开。
@@ -60,7 +60,7 @@ comments: true
一行命令即可快速体验 doc_understanding 产线效果:
```bash
-paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出'}"
+paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容,以markdown格式输出'}"
```
命令行支持更多参数设置,点击展开以查看命令行参数的详细说明
@@ -76,43 +76,38 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
input
-待预测数据,支持多种输入类型,必填。
-
-- Python Var:如
numpy.ndarray
表示的图像数据
-- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
-
+ 待预测数据,必填。如"{'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容,以markdown格式输出'}"。
-Python Var|str|list
+str
save_path
-指定推理结果文件保存的路径。如果设置为None
, 推理结果将不会保存到本地。
+指定推理结果文件保存的路径。如果不设置,推理结果将不会保存到本地。
str
-None
+
doc_understanding_model_name
-文档理解模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档理解模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
doc_understanding_model_dir
-文档理解模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档理解模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
doc_understanding_batch_size
-文档理解模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文档理解模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
-None
+
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -120,11 +115,10 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
-
+如果不设置,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
str
-None
+
enable_hpi
@@ -152,10 +146,10 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -167,7 +161,7 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
paddlex_config
PaddleX产线配置文件路径。
str
-None
+
@@ -177,7 +171,7 @@ paddleocr doc_understanding -i "{'image': 'https://paddle-model-ecology.bj.bcebo
运行结果会被打印到终端上,默认配置的 doc_understanding 产线的运行结果如下:
```bash
-{'res': {'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容, 以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国(CHN) | 48 | 22 | 30 | 100 |\n| 2 | 美国(USA) | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯(RUS) | 24 | 13 | 23 | 60 |\n| 4 | 英国(GBR) | 19 | 13 | 19 | 51 |\n| 5 | 德国(GER) | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚(AUS) | 14 | 15 | 17 | 46 |\n| 7 | 韩国(KOR) | 13 | 11 | 8 | 32 |\n| 8 | 日本(JPN) | 9 | 8 | 8 | 25 |\n| 9 | 意大利(ITA) | 8 | 9 | 10 | 27 |\n| 10 | 法国(FRA) | 7 | 16 | 20 | 43 |\n| 11 | 荷兰(NED) | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰(UKR) | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚(KEN) | 6 | 4 | 6 | 16 |\n| 14 | 西班牙(ESP) | 5 | 11 | 3 | 19 |\n| 15 | 牙买加(JAM) | 5 | 4 | 2 | 11 |\n'}}
+{'res': {'image': 'https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png', 'query': '识别这份表格的内容,以markdown格式输出', 'result': '| 名次 | 国家/地区 | 金牌 | 银牌 | 铜牌 | 奖牌总数 |\n| --- | --- | --- | --- | --- | --- |\n| 1 | 中国(CHN) | 48 | 22 | 30 | 100 |\n| 2 | 美国(USA) | 36 | 39 | 37 | 112 |\n| 3 | 俄罗斯(RUS) | 24 | 13 | 23 | 60 |\n| 4 | 英国(GBR) | 19 | 13 | 19 | 51 |\n| 5 | 德国(GER) | 16 | 11 | 14 | 41 |\n| 6 | 澳大利亚(AUS) | 14 | 15 | 17 | 46 |\n| 7 | 韩国(KOR) | 13 | 11 | 8 | 32 |\n| 8 | 日本(JPN) | 9 | 8 | 8 | 25 |\n| 9 | 意大利(ITA) | 8 | 9 | 10 | 27 |\n| 10 | 法国(FRA) | 7 | 16 | 20 | 43 |\n| 11 | 荷兰(NED) | 7 | 5 | 4 | 16 |\n| 12 | 乌克兰(UKR) | 7 | 4 | 11 | 22 |\n| 13 | 肯尼亚(KEN) | 6 | 4 | 6 | 16 |\n| 14 | 西班牙(ESP) | 5 | 11 | 3 | 19 |\n| 15 | 牙买加(JAM) | 5 | 4 | 2 | 11 |\n'}}
```
### 2.2 Python脚本方式集成
@@ -191,7 +185,7 @@ pipeline = DocUnderstanding()
output = pipeline.predict(
{
"image": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/medal_table.png",
- "query": "识别这份表格的内容, 以markdown格式输出"
+ "query": "识别这份表格的内容,以markdown格式输出"
}
)
for res in output:
@@ -201,7 +195,7 @@ for res in output:
在上述 Python 脚本中,执行了如下几个步骤:
-(1)通过 `DocUnderstanding()` 实例化 文档理解产线 产线对象,具体参数说明如下:
+(1)通过 `DocUnderstanding()` 实例化文档理解产线产线对象,具体参数说明如下:
@@ -215,25 +209,25 @@ for res in output:
doc_understanding_model_name
-文档理解模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档理解模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
doc_understanding_model_dir
-文档理解模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档理解模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
doc_understanding_batch_size
-文档理解模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文档理解模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
None
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -241,7 +235,7 @@ for res in output:
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
+- None:如果设置为
None
,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
str
@@ -255,7 +249,7 @@ for res in output:
use_tensorrt
-是否使用 TensorRT 进行推理加速。
+是否使用TensorRT进行推理加速。
bool
False
@@ -269,14 +263,14 @@ for res in output:
precision
计算精度,如 fp32、fp16。
str
-fp32
+"fp32"
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -310,19 +304,14 @@ for res in output:
input
-待预测数据,目前仅支持字典类型的输入
+ 待预测数据,目前仅支持dict类型的输入
- - Python Dict:如PP-DocBee的输入形式为:
{"image":/path/to/image, "query": user question}
,分别表示输入的图像和对应的用户问题
+ - Python Dict:如PP-DocBee的输入形式为:
{"image":/path/to/image, "query": user question}
,分别表示输入的图像和对应的用户问题。
Python Dict
-
-device
-与实例化时的参数相同。
-str
-None
(3)对预测结果进行处理,每个样本的预测结果均为对应的Result对象,且支持打印、保存为`json`文件的操作:
@@ -343,19 +332,19 @@ for res in output:
打印结果到终端
format_json
bool
-是否对输出内容进行使用 JSON
缩进格式化
+是否对输出内容进行使用 JSON
缩进格式化。
True
indent
int
-指定缩进级别,以美化输出的 JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效
+指定缩进级别,以美化输出的 JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。
4
ensure_ascii
bool
-控制是否将非 ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效
+控制是否将非 ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。
False
@@ -363,19 +352,19 @@ for res in output:
将结果保存为json格式的文件
save_path
str
-保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致
+保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致。
无
indent
int
-指定缩进级别,以美化输出的 JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效
+指定缩进级别,以美化输出的 JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。
4
ensure_ascii
bool
-控制是否将非 ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效
+控制是否将非 ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。
False
@@ -415,7 +404,7 @@ for res in output:
如果产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。
-若您需要将产线直接应用在您的Python项目中,可以参考 [2.2 Python脚本方式](#22-python脚本方式集成)中的示例代码。
+若您需要将产线直接应用在您的Python项目中,可以参考 [2.2 Python脚本方式](#22-python脚本方式集成) 中的示例代码。
此外,PaddleOCR 也提供了其他两种部署方式,详细说明如下:
diff --git a/docs/version3.x/pipeline_usage/formula_recognition.en.md b/docs/version3.x/pipeline_usage/formula_recognition.en.md
index fe5f20ba4681fda9b5bb9f4c537ab5529ae66526..d91f5f3f6e096bd727722f1d39410db0a9c65231 100644
--- a/docs/version3.x/pipeline_usage/formula_recognition.en.md
+++ b/docs/version3.x/pipeline_usage/formula_recognition.en.md
@@ -2,7 +2,7 @@
comments: true
---
-# Formula Recognition Pipeline Tutorial
+# Formula Recognition Pipeline Usage Tutorial
## 1. Introduction to Formula Recognition Pipeline
@@ -255,7 +255,7 @@ In this pipeline, you can choose the model you want to use based on the benchmar
-Formula Recognition Module :
+Formula Recognition Module :
Model Model Download Link
@@ -331,7 +331,7 @@ In this pipeline, you can choose the model you want to use based on the benchmar
- Performance Test Environment
- - Test Dataset:
+
- Test Dataset:
- Document Image Orientation Classification Module: A self-built dataset using PaddleOCR, covering multiple scenarios such as ID cards and documents, containing 1000 images.
@@ -341,7 +341,7 @@ In this pipeline, you can choose the model you want to use based on the benchmar
- Formula Recognition Module: A self-built formula recognition test set using PaddleX.
- - Hardware Configuration:
+
- Hardware Configuration:
- GPU: NVIDIA Tesla T4
- CPU: Intel Xeon Gold 6271C @ 2.60GHz
@@ -388,7 +388,7 @@ Before using the formula recognition pipeline locally, please ensure that you ha
### 2.1 Command Line Experience
-You can quickly experience the effect of the formula recognition pipeline with one command:
+You can quickly experience the effect of the formula recognition pipeline with one command. Before running the code below, please download the [example image](https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png) locally:
```bash
paddleocr formula_recognition_pipeline -i https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png
@@ -416,172 +416,156 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
input
-Data to be predicted, supporting multiple input types, required.
-
-- Python Var: Image data represented by
numpy.ndarray
-- str: Local path of image or PDF file, e.g.,
/root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)
-- List: Elements of the list must be of the above types, e.g.,
[numpy.ndarray, numpy.ndarray]
, [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]
, [\"/root/data1\", \"/root/data2\"]
-
+ Data to be predicted, required.
+Local path of image or PDF file, e.g., /root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path).
-Python Var|str|list
+str
save_path
-Specify the path to save the inference results file. If set to None
, the inference results will not be saved locally.
+Specify the path to save the inference results file. If not set, the inference results will not be saved locally.
str
-None
+
doc_orientation_classify_model_name
-The name of the document orientation classification model. If set to None
, the default model in pipeline will be used.
+The name of the document orientation classification model. If not set, the default model in pipeline will be used.
str
-None
+
doc_orientation_classify_model_dir
-The directory path of the document orientation classification model. If set to None
, the official model will be downloaded.
+The directory path of the document orientation classification model. If not set, the official model will be downloaded.
str
-None
+
doc_orientation_classify_batch_size
-The batch size of the document orientation classification model. If set to None
, the default batch size will be set to 1
.
+ The batch size of the document orientation classification model. If not set, the default batch size will be set to 1
.
int
-None
+
doc_unwarping_model_name
- The name of the text image unwarping model. If set to None
, the default model in pipeline will be used.
+ The name of the text image unwarping model. If not set, the default model in pipeline will be used.
str
-None
+
doc_unwarping_model_dir
- The directory path of the text image unwarping model. If set to None
, the official model will be downloaded.
+ The directory path of the text image unwarping model. If not set, the official model will be downloaded.
str
-None
+
doc_unwarping_batch_size
-The batch size of the text image unwarping model. If set to None
, the default batch size will be set to 1
.
+The batch size of the text image unwarping model. If not set, the default batch size will be set to 1
.
int
-None
+
use_doc_orientation_classify
-Whether to load the document orientation classification module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the document orientation classification module. If not set, the parameter will default to the value initialized in the pipeline, which is True
.
bool
-None
+
use_doc_unwarping
-Whether to load the text image unwarping module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the text image unwarping module. If not set, the parameter will default to the value initialized in the pipeline, which is True
.
bool
-None
+
layout_detection_model_name
-The name of the layout detection model. If set to None
, the default model in pipeline will be used.
+The name of the layout detection model. If not set, the default model in pipeline will be used.
str
-None
+
layout_detection_model_dir
- The directory path of the layout detection model. If set to None
, the official model will be downloaded.
+ The directory path of the layout detection model. If not set, the official model will be downloaded.
str
-None
+
layout_threshold
-Threshold for layout detection, used to filter out predictions with low confidence.
-
-- float, such as 0.2, indicates filtering out all bounding boxes with a confidence score less than 0.2.
-- Dictionary, with int keys representing
cls_id
and float values as thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
indicates applying a threshold of 0.45 for class ID 0, 0.48 for class ID 2, and 0.4 for class ID 7
-- None, If not specified, the default PaddleX official model configuration will be used
-
+ Score threshold for the layout model. Any value between 0-1
. If not set, the default value is used, which is 0.5
.
-float|dict
-None
+float
+
layout_nms
-Whether to use NMS (Non-Maximum Suppression) post-processing for layout region detection to filter out overlapping boxes. If set to None
, the default configuration of the official model will be used.
+Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection. If not set, the parameter will default to the value initialized in the pipeline, which is set to True
by default.
bool
-None
+
layout_unclip_ratio
-
-The scaling factor for the side length of the detection boxes in layout region detection.
-
-- float: A positive float number, e.g., 1.1, indicating that the center of the bounding box remains unchanged while the width and height are both scaled up by a factor of 1.1
-- List: e.g., [1.2, 1.5], indicating that the center of the bounding box remains unchanged while the width is scaled up by a factor of 1.2 and the height by a factor of 1.5
-- None: If not specified, the default PaddleX official model configuration will be used
-
+ Unclip ratio for detected boxes in layout detection model. Any float > 0
. If not set, the default is 1.0
.
-float|list
-None
+float
+
layout_merge_bboxes_mode
The merging mode for the detection boxes output by the model in layout region detection.
-- large: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed.
-- small: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed.
-- union: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained.
-- None: If not specified, the default PaddleX official model configuration will be used
-
+- large: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed;
+- small: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed;
+- union: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained;
+If not set, the default is large
.
str
-None
+
layout_detection_batch_size
-The batch size for the layout region detection model. If set to None
, the default batch size will be set to 1
.
+The batch size for the layout region detection model. If not set, the default batch size will be set to 1
.
int
-None
+
use_layout_detection
-Whether to load the layout detection module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the layout detection module. If not set, the parameter will default to the value initialized in the pipeline, which is True
.
bool
-None
+
formula_recognition_model_name
-The name of the formula recognition model. If set to None
, the default model from the pipeline will be used.
+The name of the formula recognition model. If not set, the default model from the pipeline will be used.
str
-None
+
formula_recognition_model_dir
-The directory path of the formula recognition model. If set to None
, the official model will be downloaded.
+ The directory path of the formula recognition model. If not set, the official model will be downloaded.
str
-None
+
formula_recognition_batch_size
-The batch size for the formula recognition model. If set to None
, the batch size will default to 1
.
+The batch size for the formula recognition model. If not set, the batch size will default to 1
.
int
-None
+
device
-The device used for inference. You can specify a particular card number.
+ The device used for inference. You can specify a particular card number:
- CPU: e.g.,
cpu
indicates using CPU for inference;
- GPU: e.g.,
gpu:0
indicates using the 1st GPU for inference;
@@ -589,11 +573,10 @@ The name of the formula recognition model. If set to None
, the defa
- XPU: e.g.,
xpu:0
indicates using the 1st XPU for inference;
- MLU: e.g.,
mlu:0
indicates using the 1st MLU for inference;
- DCU: e.g.,
dcu:0
indicates using the 1st DCU for inference;
-- None: If set to
None
, the default value initialized by the pipeline will be used. During initialization, the local GPU 0 will be prioritized; if unavailable, the CPU will be used.
-
+If not set, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
str
-None
+
enable_hpi
@@ -621,10 +604,10 @@ The name of the formula recognition model. If set to None
, the defa
enable_mkldnn
-Whether to enable the MKL-DNN acceleration library. If set to None
, it will be enabled by default.
+ Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool
-None
+True
cpu_threads
@@ -637,7 +620,7 @@ The number of threads to use when performing inference on the CPU.
paddlex_config
Path to PaddleX pipeline configuration file.
str
-None
+
@@ -736,13 +719,13 @@ In the above Python script, the following steps are executed:
use_doc_orientation_classify
-Whether to load the document orientation classification module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the document orientation classification module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
bool
None
use_doc_unwarping
-Whether to load the text image unwarping module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the text image unwarping module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
bool
None
@@ -762,9 +745,9 @@ In the above Python script, the following steps are executed:
layout_threshold
Threshold for layout detection, used to filter out predictions with low confidence.
-- float, such as 0.2, indicates filtering out all bounding boxes with a confidence score less than 0.2.
-- Dictionary, with int keys representing
cls_id
and float values as thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
indicates applying a threshold of 0.45 for class ID 0, 0.48 for class ID 2, and 0.4 for class ID 7
-- None, If not specified, the default PaddleX official model configuration will be used
+- float: Such as 0.2, indicates filtering out all bounding boxes with a confidence score less than 0.2;
+- Dictionary: With int keys representing
cls_id
and float values as thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
indicates applying a threshold of 0.45 for class ID 0, 0.48 for class ID 2, and 0.4 for class ID 7;
+- None: If set to
None
, the default is 0.5
.
float|dict
@@ -772,33 +755,33 @@ In the above Python script, the following steps are executed:
layout_nms
-Whether to use NMS (Non-Maximum Suppression) post-processing for layout region detection to filter out overlapping boxes. If set to None
, the default configuration of the official model will be used.
+Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection. If set to None
, the parameter will default to the value initialized in the pipeline, which is set to True
by default.
bool
None
layout_unclip_ratio
-The scaling factor for the side length of the detection boxes in layout region detection.
+ Expansion factor for the detection boxes of the layout region detection model.
-- float: A positive float number, e.g., 1.1, indicating that the center of the bounding box remains unchanged while the width and height are both scaled up by a factor of 1.1
-- List: e.g., [1.2, 1.5], indicating that the center of the bounding box remains unchanged while the width is scaled up by a factor of 1.2 and the height by a factor of 1.5
-- None: If not specified, the default PaddleX official model configuration will be used
+- float: Any float greater than
0
;
+- Tuple[float,float]: Expansion ratios in horizontal and vertical directions;
+- dict: A dictionary with int keys representing
cls_id
, and tuple values, e.g., {0: (1.1, 2.0)}
means width is expanded 1.1× and height 2.0× for class 0 boxes;
+- None: If set to
None
, uses the pipeline default of 1.0
.
-float|list
+float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
-The merging mode for the detection boxes output by the model in layout region detection.
+ Filtering method for overlapping boxes in layout detection.
-- large: When set to "large", only the largest outer bounding box will be retained for overlapping bounding boxes, and the inner overlapping boxes will be removed.
-- small: When set to "small", only the smallest inner bounding boxes will be retained for overlapping bounding boxes, and the outer overlapping boxes will be removed.
-- union: No filtering of bounding boxes will be performed, and both inner and outer boxes will be retained.
-- None: If not specified, the default PaddleX official model configuration will be used
+- str: Options include
large
, small
, and union
to retain the larger box, smaller box, or both;
+- dict: A dictionary with int keys representing
cls_id
, and str values, e.g., {0: "large", 2: "small"}
means using different modes for different classes;
+- None: If set to
None
, uses the pipeline default value large
.
-str
+str|dict
None
@@ -809,7 +792,7 @@ In the above Python script, the following steps are executed:
use_layout_detection
-Whether to load the layout detection module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
+Whether to load and use the layout detection module. If set to None
, the parameter will default to the value initialized in the pipeline, which is True
.
bool
None
@@ -833,7 +816,7 @@ In the above Python script, the following steps are executed:
device
-The device used for inference. You can specify a particular card number.
+ The device used for inference. You can specify a particular card number:
- CPU: e.g.,
cpu
indicates using CPU for inference;
- GPU: e.g.,
gpu:0
indicates using the 1st GPU for inference;
@@ -841,8 +824,8 @@ In the above Python script, the following steps are executed:
- XPU: e.g.,
xpu:0
indicates using the 1st XPU for inference;
- MLU: e.g.,
mlu:0
indicates using the 1st MLU for inference;
- DCU: e.g.,
dcu:0
indicates using the 1st DCU for inference;
-- None: If set to
None
, the default value initialized by the pipeline will be used. During initialization, the local GPU 0 will be prioritized; if unavailable, the CPU will be used.
-
+- None: If set to
None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.
+
str
None
@@ -869,14 +852,14 @@ In the above Python script, the following steps are executed:
precision
Compute precision, such as FP32 or FP16.
str
-fp32
+"fp32"
enable_mkldnn
-Whether to enable the MKL-DNN acceleration library. If set to None
, it will be enabled by default.
+ Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set.
bool
-None
+True
cpu_threads
@@ -893,8 +876,7 @@ In the above Python script, the following steps are executed:
-(2)
-Call the `predict()` method of the formula recognition pipeline object to perform inference prediction. This method will return a list of results.
+(2)Call the `predict()` method of the formula recognition pipeline object to perform inference prediction. This method will return a list of results.
Additionally, the pipeline also provides the `predict_iter()` method. Both methods are completely consistent in terms of parameter acceptance and result return. The difference is that `predict_iter()` returns a `generator`, which allows for step-by-step processing and retrieval of prediction results. This is suitable for handling large datasets or scenarios where memory saving is desired. You can choose to use either of these methods based on your actual needs.
@@ -913,18 +895,13 @@ Here are the parameters of the `predict()` method and their descriptions:
input
Data to be predicted, supporting multiple input types, required.
-- Python Var: Image data represented by
numpy.ndarray
-- str: Local path of image or PDF file, e.g.,
/root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)
-- List: Elements of the list must be of the above types, e.g.,
[numpy.ndarray, numpy.ndarray]
, [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]
, [\"/root/data1\", \"/root/data2\"]
+- Python Var: Image data represented by
numpy.ndarray;
+- str: Local path of image or PDF file, e.g.,
/root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path);
+- List: Elements of the list must be of the above types, e.g.,
[numpy.ndarray, numpy.ndarray]
, [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]
, [\"/root/data1\", \"/root/data2\"].
Python Var|str|list
-
-device
-The parameters are the same as those used during instantiation.
-str
-None
use_layout_detection
@@ -961,7 +938,7 @@ Whether to use the document orientation classification module during inference.<
layout_unclip_ratio
The parameters are the same as those used during instantiation.
-float|list
+float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
@@ -990,19 +967,19 @@ Whether to use the document orientation classification module during inference.<
Print results to terminal
format_json
bool
-Whether to format the output content using JSON
indentation
+Whether to format the output content using JSON
indentation.
True
indent
int
-Specify the indentation level to beautify the output JSON
data, making it more readable. Effective only when format_json
is True
+Specify the indentation level to beautify the output JSON
data, making it more readable. Effective only when format_json
is True
.
4
ensure_ascii
bool
-Control whether to escape non-ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
retains the original characters. Effective only when format_json
is True
+Control whether to escape non-ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
retains the original characters. Effective only when format_json
is True
.
False
@@ -1010,19 +987,19 @@ Whether to use the document orientation classification module during inference.<
Save results as a JSON file
save_path
str
-Path to save the file. If it is a directory, the saved file will be named the same as the input file type
+Path to save the file. If it is a directory, the saved file will be named the same as the input file type.
无
indent
int
-Specify the indentation level to beautify the output JSON
data, making it more readable. Effective only when format_json
is True
+Specify the indentation level to beautify the output JSON
data, making it more readable. Effective only when format_json
is True
.
4
ensure_ascii
bool
-Control whether to escape non-ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
retains the original characters. Effective only when format_json
is True
+Control whether to escape non-ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
retains the original characters. Effective only when format_json
is True
.
False
@@ -1030,7 +1007,7 @@ Whether to use the document orientation classification module during inference.<
Save results as an image file
save_path
str
-Path to save the file, supports directory or file path
+Path to save the file, supports directory or file path.
无
@@ -1102,8 +1079,8 @@ In addition, PaddleOCR also provides two other deployment methods, which are det
🚀 High-Performance Inference: In real-world production environments, many applications have stringent standards for performance metrics of deployment strategies, particularly regarding response speed, to ensure efficient system operation and a smooth user experience. To address this, PaddleOCR offers high-performance inference capabilities designed to deeply optimize the performance of model inference and pre/post-processing, significantly accelerating the end-to-end process. For detailed information on the high-performance inference process, please refer to the [High-Performance Inference Guide](../deployment/high_performance_inference.en.md).
-☁️ Service-Based Deployment:
-Service-Based Deployment is a common deployment form in real-world production environments. By encapsulating inference capabilities as a service, clients can access these services via network requests to obtain inference results. For detailed instructions on Service-Based Deployment in production lines, please refer to the [Service-Based Deployment Guide](../deployment/serving.md).
+☁️ Service-Based Deployment:
+Service-Based Deployment is a common deployment form in real-world production environments. By encapsulating inference capabilities as a service, clients can access these services via network requests to obtain inference results. For detailed instructions on Service-Based Deployment in pipelines, please refer to the [Service-Based Deployment Guide](../deployment/serving.en.md).
Below are the API references for basic service-based deployment and multi-language service invocation examples:
@@ -1345,6 +1322,8 @@ for i, res in enumerate(result["formulaRecResults"]):
If the default model weights provided by the formula recognition pipeline do not meet your requirements in terms of accuracy or speed, you can try to fine-tune the existing models using your own domain-specific or application-specific data to improve the recognition performance of the formula recognition pipeline in your scenario.
+### 4.1 Model Fine-Tuning
+
Since the formula recognition pipeline consists of several modules, if the pipeline's performance is not satisfactory, the issue may arise from any one of these modules. You can analyze the poorly recognized images to determine which module is problematic and refer to the corresponding fine-tuning tutorial links in the table below for model fine-tuning.
@@ -1359,17 +1338,17 @@ Since the formula recognition pipeline consists of several modules, if the pipel
Formulas are missing
Layout Detection Module
-Link
+Link
Formula content is inaccurate
Formula Recognition Module
-Link
+Link
Whole-image rotation correction is inaccurate
Document Image Orientation Classification Module
-Link
+Link
Image distortion correction is inaccurate
@@ -1378,3 +1357,122 @@ Since the formula recognition pipeline consists of several modules, if the pipel
+
+### 4.2 Model Deployment
+
+After you complete fine-tuning training using a private dataset, you can obtain a local model weight file. You can then use the fine-tuned model weights by specifying the local model save path through parameters or by customizing the pipeline configuration file.
+
+#### 4.2.1 Specify the local model path through parameters
+
+When initializing the pipeline object, specify the local model path through parameters. Take the usage of the weights after fine-tuning the text detection model as an example, as follows:
+
+Command line mode:
+
+```bash
+# Specify the local model path via --formula_recognition_model_dir
+paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png --formula_recognition_model_dir your_formula_recognition_model_path
+
+# PP-FormulaNet_plus-M model is used as the default formula recognition model. If you do not fine-tune this model, modify the model name by using --formula_recognition_model_name
+paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png --formula_recognition_model_name PP-FormulaNet_plus-M --formula_recognition_model_dir your_ppformulanet_plus-m_formula_recognition_model_path
+```
+
+Script mode:
+
+```python
+
+from paddleocr import FormulaRecognitionPipeline
+
+# Specify the local model path via formula_recognition_model_dir
+pipeline = FormulaRecognitionPipeline(formula_recognition_model_dir="./your_formula_recognition_model_path")
+output = pipeline.predict("./general_formula_recognition_001.png")
+for res in output:
+ res.print() ## Print the structured output of the prediction
+ res.save_to_img(save_path="output") ## Save the formula visualization result of the current image.
+ res.save_to_json(save_path="output") ## Save the structured JSON result of the current image
+
+# PP-FormulaNet_plus-M model is used as the default formula recognition model. If you do not fine-tune this model, modify the model name by using formula_recognition_model_name
+# pipeline = FormulaRecognitionPipeline(formula_recognition_model_name="PP-FormulaNet_plus-M", formula_recognition_model_dir="./your_ppformulanet_plus-m_formula_recognition_model_path")
+
+```
+
+
+#### 4.2.2 Specify the local model path through the configuration file
+
+
+1.Obtain the pipeline configuration file
+
+Call the `export_paddlex_config_to_yaml` method of the **Formula Recognition Pipeline** object in PaddleOCR to export the current pipeline configuration as a YAML file:
+
+```Python
+from paddleocr import FormulaRecognitionPipeline
+
+pipeline = FormulaRecognitionPipeline()
+pipeline.export_paddlex_config_to_yaml("FormulaRecognitionPipeline.yaml")
+```
+
+2.Modify the Configuration File
+
+After obtaining the default pipeline configuration file, replace the paths of the default model weights with the local paths of your fine-tuned model weights. For example:
+
+```yaml
+......
+SubModules:
+ FormulaRecognition:
+ batch_size: 5
+ model_dir: null # Replace with the path to your fine-tuned formula recognition model weights
+ model_name: PP-FormulaNet_plus-M # If the name of the fine-tuned model is different from the default model name, please modify it here as well
+ module_name: formula_recognition
+ LayoutDetection:
+ batch_size: 1
+ layout_merge_bboxes_mode: large
+ layout_nms: true
+ layout_unclip_ratio: 1.0
+ model_dir: null # Replace with the path to your fine-tuned layout detection model weights
+ model_name: PP-DocLayout_plus-L # If the name of the fine-tuned model is different from the default model name, please modify it here as well
+ module_name: layout_detection
+ threshold: 0.5
+SubPipelines:
+ DocPreprocessor:
+ SubModules:
+ DocOrientationClassify:
+ batch_size: 1
+ model_dir: null # Replace with the path to your fine-tuned document image orientation classification model weights
+ model_name: PP-LCNet_x1_0_doc_ori # If the name of the fine-tuned model is different from the default model name, please modify it here as well
+ module_name: doc_text_orientation
+ DocUnwarping:
+ batch_size: 1
+ model_dir: null
+ model_name: UVDoc
+ module_name: image_unwarping
+ pipeline_name: doc_preprocessor
+ use_doc_orientation_classify: true
+ use_doc_unwarping: true
+pipeline_name: formula_recognition
+use_doc_preprocessor: true
+use_layout_detection: true
+......
+```
+
+The pipeline configuration file includes not only the parameters supported by the PaddleOCR CLI and Python API but also advanced configurations. For detailed instructions, refer to the [PaddleX Pipeline Usage Overview](https://paddlepaddle.github.io/PaddleX/3.0/en/pipeline_usage/pipeline_develop_guide.html) and adjust the configurations as needed.
+
+3.Load the Configuration File in CLI
+
+After modifying the configuration file, specify its path using the `--paddlex_config` parameter in the command line. PaddleOCR will read the file and apply the configurations. Example:
+
+```bash
+paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png --paddlex_config FormulaRecognitionPipeline.yaml
+```
+4.Load the Configuration File in Python API
+
+When initializing the pipeline object, pass the path of the PaddleX pipeline configuration file or a configuration dictionary via the `paddlex_config` parameter. PaddleOCR will read and apply the configurations. Example:
+
+```python
+from paddleocr import FormulaRecognitionPipeline
+
+pipeline = FormulaRecognitionPipeline(paddlex_config="FormulaRecognitionPipeline.yaml")
+output = pipeline.predict("./general_formula_recognition_001.png")
+for res in output:
+ res.print() ## Print the structured output of the prediction
+ res.save_to_img(save_path="output") ## Save the formula visualization result of the current image.
+ res.save_to_json(save_path="output") ## Save the structured JSON result of the current image
+```
diff --git a/docs/version3.x/pipeline_usage/formula_recognition.md b/docs/version3.x/pipeline_usage/formula_recognition.md
index 33e110dd4a73f8be7d6d7b75f5e24f2e320360ba..53813cb1c814574c4d30dd3cb93355ee6d907c46 100644
--- a/docs/version3.x/pipeline_usage/formula_recognition.md
+++ b/docs/version3.x/pipeline_usage/formula_recognition.md
@@ -388,7 +388,7 @@ comments: true
### 2.1 命令行方式体验
-一行命令即可快速体验 formula_recognition 产线效果:
+一行命令即可快速体验 formula_recognition 产线效果。运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png)到本地:
```bash
paddleocr formula_recognition_pipeline -i https://paddle-model-ecology.bj.bcebos.com/paddlex/demo_image/pipelines/general_formula_recognition_001.png
@@ -416,158 +416,145 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
input
-待预测数据,支持多种输入类型,必填。
-
-- Python Var:如
numpy.ndarray
表示的图像数据
-- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
-
+ 待预测数据,必填。
+如图像文件或者PDF文件的本地路径:/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
-Python Var|str|list
+str
save_path
-指定推理结果文件保存的路径。如果设置为None
, 推理结果将不会保存到本地。
+指定推理结果文件保存的路径。如果不设置,推理结果将不会保存到本地。
str
-None
+
doc_orientation_classify_model_name
-文档方向分类模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档方向分类模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
doc_orientation_classify_model_dir
-文档方向分类模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档方向分类模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
doc_orientation_classify_batch_size
-文档方向分类模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文档方向分类模型的批处理大小。如果不设置,将默认设置批处理大小为1
。
int
-None
+
doc_unwarping_model_name
-文本图像矫正模型的名称。如果设置为None
, 将会使用产线默认模型。
+文本图像矫正模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
doc_unwarping_model_dir
-文本图像矫正模型的目录路径。如果设置为None
, 将会下载官方模型。
+文本图像矫正模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
doc_unwarping_batch_size
-文本图像矫正模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文本图像矫正模型的批处理大小。如果不设置,将默认设置批处理大小为1
。
int
-None
+
use_doc_orientation_classify
-是否加载文档方向分类模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用文档方向分类模块。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
use_doc_unwarping
-是否加载文本图像矫正模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用文本图像矫正模块。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
layout_detection_model_name
-版面区域检测模型的名称。如果设置为None
, 将会使用产线默认模型。
+版面区域检测模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
layout_detection_model_dir
-版面区域检测模型的目录路径。如果设置为None
, 将会下载官方模型。
+版面区域检测模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
layout_threshold
版面区域检测的阈值,用于过滤掉低置信度预测结果的阈值。
-
-- float,如 0.2, 表示过滤掉所有阈值小于0.2的目标框
-- 字典,字典的key为int类型,代表
cls_id
,val为float类型阈值。如 {0: 0.45, 2: 0.48, 7: 0.4}
,表示对cls_id为0的类别应用阈值0.45、cls_id为2的类别应用阈值0.48、cls_id为7的类别应用阈值0.4
-- None, 不指定,将默认使用默认值
-
+如 0.2,表示过滤掉所有阈值小于0.2的目标框。如果不设置,将默认使用默认值。
-float|dict
-None
+float
+
layout_nms
-版面区域检测是否使用NMS后处理,过滤重叠框。如果设置为None
, 将会使用官方模型配置。
+版面检测是否使用后处理NMS。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
layout_unclip_ratio
版面区域检测中检测框的边长缩放倍数。
-
-- float, 大于0的浮点数,如 1.1 , 表示将模型输出的检测框中心不变,宽和高都扩张1.1倍
-- 列表, 如 [1.2, 1.5] , 表示将模型输出的检测框中心不变,宽度扩张1.2倍,高度扩张1.5倍
-- None, 不指定,将使用默认值:1.0
-
+大于0的浮点数,如 1.1 ,表示将模型输出的检测框中心不变,宽和高都扩张1.1倍。如果不设置,将使用默认值:1.0。
-float|list
-None
+float
+
layout_merge_bboxes_mode
版面区域检测中模型输出的检测框的合并处理模式。
-- large, 设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框。
-- small, 设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框。
-- union, 不进行框的过滤处理,内外框都保留
-- None, 不指定,将使用默认值:“large”
-
+- large,设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框;
+- small,设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框;
+- union,不进行框的过滤处理,内外框都保留
+如果不设置,将使用默认值:“large”;
str
-None
+
layout_detection_batch_size
-版面区域检测模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+版面区域检测模型的批处理大小。如果不设置,将默认设置批处理大小为1
。
int
-None
+
use_layout_detection
-是否加载版面区域检测模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用版面区域检测模块。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
-None
+
formula_recognition_model_name
-公式识别模型的名称。如果设置为None
, 将会使用产线默认模型。
+公式识别模型的名称。如果不设置,将会使用产线默认模型。
str
-None
+
formula_recognition_model_dir
-公式识别模型的目录路径。如果设置为None
, 将会下载官方模型。
+公式识别模型的目录路径。如果不设置,将会下载官方模型。
str
-None
+
formula_recognition_batch_size
-公式识别模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+公式识别模型的批处理大小。如果不设置,将默认设置批处理大小为1
。
int
-None
+
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -575,11 +562,11 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
-
+如果不设置, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
+
str
-None
+
enable_hpi
@@ -607,10 +594,10 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -622,7 +609,7 @@ paddleocr formula_recognition_pipeline -i ./general_formula_recognition_001.png
paddlex_config
PaddleX产线配置文件路径。
str
-None
+
@@ -684,61 +671,61 @@ for res in output:
doc_orientation_classify_model_name
-文档方向分类模型的名称。如果设置为None
, 将会使用产线默认模型。
+文档方向分类模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
doc_orientation_classify_model_dir
-文档方向分类模型的目录路径。如果设置为None
, 将会下载官方模型。
+文档方向分类模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
doc_orientation_classify_batch_size
-文档方向分类模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文档方向分类模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
None
doc_unwarping_model_name
-文本图像矫正模型的名称。如果设置为None
, 将会使用产线默认模型。
+文本图像矫正模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
doc_unwarping_model_dir
-文本图像矫正模型的目录路径。如果设置为None
, 将会下载官方模型。
+文本图像矫正模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
doc_unwarping_batch_size
-文本图像矫正模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+文本图像矫正模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
None
use_doc_orientation_classify
-是否加载文档方向分类模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用文档方向分类模块。如果设置为None
,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
use_doc_unwarping
-是否加载文本图像矫正模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用文本图像矫正模块。如果设置为None
,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
layout_detection_model_name
-版面区域检测模型的名称。如果设置为None
, 将会使用产线默认模型。
+版面区域检测模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
layout_detection_model_dir
-版面区域检测模型的目录路径。如果设置为None
, 将会下载官方模型。
+版面区域检测模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
@@ -746,78 +733,77 @@ for res in output:
layout_threshold
版面区域检测的阈值,用于过滤掉低置信度预测结果的阈值。
-- float,如 0.2, 表示过滤掉所有阈值小于0.2的目标框
-- 字典,字典的key为int类型,代表
cls_id
,val为float类型阈值。如 {0: 0.45, 2: 0.48, 7: 0.4}
,表示对cls_id为0的类别应用阈值0.45、cls_id为2的类别应用阈值0.48、cls_id为7的类别应用阈值0.4
-- None, 不指定,将使用默认值:0.5
-
+- float:如 0.2,表示过滤掉所有阈值小于0.2的目标框;
+- dict:dict的key为int类型,代表
cls_id
,val为float类型阈值。如 {0: 0.45,2: 0.48,7: 0.4}
,表示对cls_id为0的类别应用阈值0.45、cls_id为2的类别应用阈值0.48、cls_id为7的类别应用阈值0.4;
+- None:不指定,将使用默认值:0.5。
float|dict
None
layout_nms
-版面区域检测是否使用NMS后处理,过滤重叠框。如果设置为None
, 将会使用官方模型配置。
+版面检测是否使用后处理NMS。如果不设置,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
layout_unclip_ratio
-版面区域检测中检测框的边长缩放倍数。
+ 版面区域检测模型检测框的扩张系数。
-- float, 大于0的浮点数,如 1.1 , 表示将模型输出的检测框中心不变,宽和高都扩张1.1倍
-- 列表, 如 [1.2, 1.5] , 表示将模型输出的检测框中心不变,宽度扩张1.2倍,高度扩张1.5倍
-- None, 不指定,将使用默认值:1.0
+- float:任意大于
0
浮点数;
+- Tuple[float,float]:在横纵两个方向各自的扩张系数;
+- dict,dict的key为int类型,代表
cls_id
,value为tuple类型,如{0: (1.1,2.0)}
,表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍
+- None:如果设置为
None
,将默认使用产线初始化的该参数值,初始化为 1.0
。
-float|list
+float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
-版面区域检测中模型输出的检测框的合并处理模式。
+ 版面区域检测的重叠框过滤方式。
-- large, 设置为large时,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留外部最大的框,删除重叠的内部框。
-- small, 设置为small,表示在模型输出的检测框中,对于互相重叠包含的检测框,只保留内部被包含的小框,删除重叠的外部框。
-- union, 不进行框的过滤处理,内外框都保留
-- None, 不指定,将使用默认值:“large”
+- str:
large
,small
,union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留;
+- dict: dict的key为int类型,代表
cls_id
,value为str类型,如{0: "large", 2: "small"}
,表示对第0类别检测框使用large模式,对第2类别检测框使用small模式;
+- None:如果设置为
None
,将默认使用产线初始化的该参数值,初始化为 large
。
-str
+str|dict
None
layout_detection_batch_size
-版面区域检测模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+版面区域检测模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
None
use_layout_detection
-是否加载版面区域检测模块。如果设置为None
, 将默认使用产线初始化的该参数值,初始化为True
。
+是否加载并使用版面区域检测模块。如果设置为None
,将默认使用产线初始化的该参数值,初始化为True
。
bool
None
formula_recognition_model_name
-公式识别模型的名称。如果设置为None
, 将会使用产线默认模型。
+公式识别模型的名称。如果设置为None
,将会使用产线默认模型。
str
None
formula_recognition_model_dir
-公式识别模型的目录路径。如果设置为None
, 将会下载官方模型。
+公式识别模型的目录路径。如果设置为None
,将会下载官方模型。
str
None
formula_recognition_batch_size
-公式识别模型的批处理大小。如果设置为 None
, 将默认设置批处理大小为1
。
+公式识别模型的批处理大小。如果设置为None
,将默认设置批处理大小为1
。
int
None
device
-用于推理的设备。支持指定具体卡号。
+ 用于推理的设备。支持指定具体卡号:
- CPU:如
cpu
表示使用 CPU 进行推理;
- GPU:如
gpu:0
表示使用第 1 块 GPU 进行推理;
@@ -825,7 +811,7 @@ for res in output:
- XPU:如
xpu:0
表示使用第 1 块 XPU 进行推理;
- MLU:如
mlu:0
表示使用第 1 块 MLU 进行推理;
- DCU:如
dcu:0
表示使用第 1 块 DCU 进行推理;
-- None:如果设置为
None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;
+- None:如果设置为
None
,将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。
str
@@ -853,14 +839,14 @@ for res in output:
precision
计算精度,如 fp32、fp16。
str
-fp32
+"fp32"
enable_mkldnn
-是否启用 MKL-DNN 加速库。如果设置为None
, 将默认启用。
+ 是否启用 MKL-DNN 加速推理。如果 MKL-DNN 不可用或模型不支持通过 MKL-DNN 加速,即使设置了此标志,也不会使用加速。
bool
-None
+True
cpu_threads
@@ -896,18 +882,13 @@ for res in output:
input
待预测数据,支持多种输入类型,必填
-- Python Var:如
numpy.ndarray
表示的图像数据
-- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)
-- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
+- Python Var:如
numpy.ndarray
表示的图像数据;
+- str:如图像文件或者PDF文件的本地路径:
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);
+- List:列表元素需为上述类型数据,如
[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]。
Python Var|str|list
-
-device
-与实例化时的参数相同。
-str
-None
use_layout_detection
@@ -942,12 +923,12 @@ for res in output:
layout_unclip_ratio
与实例化时的参数相同。
-float|list
+float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
与实例化时的参数相同。
-string
+str|dict
None
@@ -971,19 +952,19 @@ for res in output:
打印结果到终端
format_json
bool
-是否对输出内容进行使用 JSON
缩进格式化
+是否对输出内容进行使用 JSON
缩进格式化。
True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
useLayoutDetection
boolean
| null
predict
方法的 use_layout_detection
参数相关说明。predict
方法的 use_layout_detection
参数相关说明。layoutThreshold
number
| null
predict
方法的 layout_threshold
参数相关说明。predict
法的layout_threshold
参数相关说明。layoutNms
boolean
| null
predict
方法的 layout_nms
参数相关说明。predict
法的layout_nms
参数相关说明。layoutUnclipRatio
number
| array
| null
predict
方法的 layout_unclip_ratio
参数相关说明。predict
方法的 layout_unclip_ratio
参数相关说明。layoutMergeBboxesMode
string
| null
predict
方法的 layout_merge_bboxes_mode
参数相关说明。predict
方法的 layout_merge_bboxes_mode
参数相关说明。prunedResult
object
predict
方法生成结果的 JSON 表示中 res
字段的简化版本,其中去除了 input_path
和 page_index
字段。predict
方法生成结果的JSON表示中res
字段的简化版本,其中去除了input_path
和 page_index
字段。outputImages
object
| null
img
属性说明。图像为JPEG格式,使用Base64编码。img
属性说明。图像为JPEG格式,使用Base64编码。inputImage
| null
Layout Region Detection Module (Optional):
+Model | Model Download Link | mAP(0.5) (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description | 34.6244 / 10.3945 | 510.57 / - | 126.01 M | -A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L | +A higher precision layout region localization model based on RT-DETR-L trained on a self-built dataset including Chinese and English papers, multi-column magazines, newspapers, PPTs, contracts, books, exam papers, research reports, ancient books, Japanese documents, and vertical text documents |
---|---|---|---|---|---|---|---|---|---|
Model | Model Download Link | mAP(0.5) (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description | 34.6244 / 10.3945 | 510.57 / - | 123.76 M | -A high-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using RT-DETR-L. | +A high precision layout region localization model based on RT-DETR-L trained on a self-built dataset including Chinese and English papers, magazines, contracts, books, exam papers, and research reports |
PP-DocLayout-M | Inference Model/Training Model | @@ -75,7 +76,7 @@ The seal text recognition pipeline is used to recognize the text content of seal13.3259 / 4.8685 | 44.0680 / 44.0680 | 22.578 | -A layout area localization model with balanced precision and efficiency, trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-L. | +A balanced model of accuracy and efficiency based on PicoDet-L trained on a self-built dataset including Chinese and English papers, magazines, contracts, books, exam papers, and research reports | |||
PP-DocLayout-S | Inference Model/Training Model | @@ -83,26 +84,25 @@ The seal text recognition pipeline is used to recognize the text content of seal8.3008 / 2.3794 | 10.0623 / 9.9296 | 4.834 | -A high-efficiency layout area localization model trained on a self-built dataset containing Chinese and English papers, magazines, contracts, books, exams, and research reports using PicoDet-S. | +A highly efficient layout region localization model based on PicoDet-S trained on a self-built dataset including Chinese and English papers, magazines, contracts, books, exam papers, and research reports |
Model | Model Download Link | mAP(0.5) (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description | 8.99 / 2.22 | 16.11 / 8.73 | 4.8 | -A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. | +A highly efficient layout region localization model based on the lightweight PicoDet-S model trained on a self-built dataset including Chinese and English papers, magazines, and research reports |
---|---|---|---|---|---|---|---|---|---|
PicoDet-L_layout_3cls | Inference Model/Training Model | @@ -120,7 +120,7 @@ The seal text recognition pipeline is used to recognize the text content of seal13.05 / 4.50 | 41.30 / 41.30 | 22.6 | -A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. | +An efficiency-accuracy balanced layout region localization model based on PicoDet-L trained on a self-built dataset including Chinese and English papers, magazines, and research reports | |||
RT-DETR-H_layout_3cls | Inference Model/Training Model | @@ -128,20 +128,20 @@ The seal text recognition pipeline is used to recognize the text content of seal114.93 / 27.71 | 947.56 / 947.56 | 470.1 | -A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. | +A high precision layout region localization model based on RT-DETR-H trained on a self-built dataset including Chinese and English papers, magazines, and research reports |
Model | Model Download Link | mAP(0.5) (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description | 9.11 / 2.12 | 15.42 / 9.12 | 4.8 | -A high-efficiency layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-S. | +A highly efficient layout region localization model based on the lightweight PicoDet-S model trained on a self-built dataset including Chinese and English papers, magazines, and research reports |
---|---|---|---|---|---|---|---|---|---|
PicoDet-L_layout_17cls | Inference Model/Training Model | @@ -159,7 +159,7 @@ The seal text recognition pipeline is used to recognize the text content of seal13.50 / 4.69 | 43.32 / 43.32 | 22.6 | -A balanced efficiency and precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using PicoDet-L. | +An efficiency-accuracy balanced layout region localization model based on PicoDet-L trained on a self-built dataset including Chinese and English papers, magazines, and research reports | |||
RT-DETR-H_layout_17cls | Inference Model/Training Model | @@ -167,20 +167,22 @@ The seal text recognition pipeline is used to recognize the text content of seal115.29 / 104.09 | 995.27 / 995.27 | 470.2 | -A high-precision layout area localization model trained on a self-built dataset of Chinese and English papers, magazines, and research reports using RT-DETR-H. | +A high precision layout region localization model based on RT-DETR-H trained on a self-built dataset including Chinese and English papers, magazines, and research reports |
Document Image Orientation Classification Module (Optional):
+Model | Model Download Link | Top-1 Acc (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | Description |
---|
Text Image Correction Module (Optional):
+Model | Model Download Link | -CER | +CER | Model Storage Size (M) | Description | UVDoc | Inference Model/Training Model | 0.179 | 30.3 M | -High-precision text image correction model | +A high precision text image correction model |
---|
Text Detection Module:
+Model | Model Download Link | Detection Hmean (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | Description |
---|---|---|---|---|---|---|---|---|
PP-OCRv4_server_seal_det | Inference Model/Training Model | -98.21 | +98.40 | 74.75 / 67.72 | 382.55 / 382.55 | 109 | @@ -239,7 +246,7 @@ The seal text recognition pipeline is used to recognize the text content of seal||
PP-OCRv4_mobile_seal_det | Inference Model/Training Model | -96.47 | +96.36 | 7.82 / 3.09 | 48.28 / 23.97 | 4.6 | @@ -247,30 +254,31 @@ The seal text recognition pipeline is used to recognize the text content of seal
Text Recognition Module:
- +Model | Model Download Links | +Model | Model Download Link | Recognition Avg Accuracy(%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description |
---|---|---|---|---|---|---|---|---|---|---|---|
PP-OCRv5_server_rec | Inference Model/Pretrained Model | +PP-OCRv5_server_rec_infer.tar">Inference Model/Training Model86.38 | 8.45/2.36 | 122.69/122.69 | 81 M | -PP-OCRv5_rec is a next-generation text recognition model. It aims to efficiently and accurately support the recognition of four major languages—Simplified Chinese, Traditional Chinese, English, and Japanese—as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters using a single model. While maintaining recognition performance, it balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios. | +PP-OCRv5_rec is a new generation text recognition model. This model aims to efficiently and accurately support the recognition of four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as complex text scenes like handwriting, vertical text, pinyin, and rare characters with a single model. It balances recognition effectiveness, inference speed, and model robustness, providing efficient and accurate technical support for document understanding in various scenarios. | ||||
PP-OCRv5_mobile_rec | Inference Model/Pretrained Model | +PP-OCRv5_mobile_rec_infer.tar">Inference Model/Training Model81.29 | 1.46/5.43 | 5.32/91.79 | @@ -278,73 +286,73 @@ PP-OCRv5_mobile_rec_infer.tar">Inference Model/Inference Model/Pretrained Model +PP-OCRv4_server_rec_doc_infer.tar">Inference Model/Training Model86.58 | 6.65 / 2.38 | 32.92 / 32.92 | -91 M | -PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data, building upon PP-OCRv4_server_rec. It enhances the recognition capabilities for some Traditional Chinese characters, Japanese characters, and special symbols, supporting over 15,000 characters. In addition to improving document-related text recognition, it also enhances general text recognition capabilities. | +181 M | +PP-OCRv4_server_rec_doc is trained on a mix of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec, enhancing recognition capabilities for some traditional Chinese characters, Japanese, and special characters, supporting over 15,000+ characters. Besides improving document-related text recognition, it also enhances general text recognition capabilities |
PP-OCRv4_mobile_rec | Inference Model/Pretrained Model | +PP-OCRv4_mobile_rec | Inference Model/Training Model | 83.28 | 4.82 / 1.20 | 16.74 / 4.64 | -11 M | -A lightweight recognition model of PP-OCRv4 with high inference efficiency, suitable for deployment on various hardware devices, including edge devices. | +88 M | +PP-OCRv4 lightweight recognition model, with high inference efficiency, can be deployed on multiple hardware devices, including edge devices | |
PP-OCRv4_server_rec | Inference Model/Pretrained Model | +PP-OCRv4_server_rec | Inference Model/Training Model | 85.19 | 6.58 / 2.43 | 33.17 / 33.17 | -87 M | -The server-side model of PP-OCRv4, offering high inference accuracy and deployable on various servers. | +151 M | +PP-OCRv4 server-side model, with high inference accuracy, can be deployed on various servers | |
en_PP-OCRv4_mobile_rec | Inference Model/Pretrained Model | +en_PP-OCRv4_mobile_rec_infer.tar">Inference Model/Training Model70.39 | 4.81 / 0.75 | 16.10 / 5.31 | -7.3 M | -An ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model, supporting English and numeric character recognition. | +66 M | +An ultra-lightweight English recognition model trained based on the PP-OCRv4 recognition model, supporting English and number recognition |
Model | Model Download Links | -Avg Accuracy for Chinese Recognition (%) | -Avg Accuracy for English Recognition (%) | -Avg Accuracy for Traditional Chinese Recognition (%) | -Avg Accuracy for Japanese Recognition (%) | -GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
-CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
+Model | Model Download Link | +Chinese Recognition Avg Accuracy(%) | +English Recognition Avg Accuracy(%) | +Traditional Chinese Recognition Avg Accuracy(%) | +Japanese Recognition Avg Accuracy(%) | +GPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
+CPU Inference Time (ms) [Regular Mode / High-Performance Mode] |
Model Storage Size (M) | -Introduction | +Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PP-OCRv5_server_rec | Inference Model/Pretrained Model | +PP-OCRv5_server_rec_infer.tar">Inference Model/Training Model86.38 | 64.70 | 93.29 | 60.35 | -8.45/2.36 | -122.69/122.69 | +1.46/5.43 | +5.32/91.79 | 81 M | -PP-OCRv5_rec is a next-generation text recognition model. It aims to efficiently and accurately support the recognition of four major languages—Simplified Chinese, Traditional Chinese, English, and Japanese—as well as complex text scenarios such as handwriting, vertical text, pinyin, and rare characters using a single model. While maintaining recognition performance, it balances inference speed and model robustness, providing efficient and accurate technical support for document understanding in various scenarios. | +PP-OCRv5_rec is a new generation text recognition model. This model aims to efficiently and accurately support the recognition of four major languages: Simplified Chinese, Traditional Chinese, English, and Japanese, as well as complex text scenes like handwriting, vertical text, pinyin, and rare characters with a single model. It balances recognition effectiveness, inference speed, and model robustness, providing efficient and accurate technical support for document understanding in various scenarios. | ||||||
PP-OCRv5_mobile_rec | Inference Model/Pretrained Model | +PP-OCRv5_mobile_rec_infer.tar">Inference Model/Training Model81.29 | 66.00 | 83.55 | @@ -360,42 +368,44 @@ PP-OCRv5_mobile_rec_infer.tar">Inference Model/Inference Model/Training Model -81.53 | +PP-OCRv4_server_rec_doc | Inference Model/Training Model | +86.58 | 6.65 / 2.38 | 32.92 / 32.92 | -74.7 M | -PP-OCRv4_server_rec_doc is trained on a mixed dataset of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec. It has added the recognition capabilities for some traditional Chinese characters, Japanese, and special characters. The number of recognizable characters is over 15,000. In addition to the improvement in document-related text recognition, it also enhances the general text recognition capability. | +91 M | +PP-OCRv4_server_rec_doc is trained on a mix of more Chinese document data and PP-OCR training data based on PP-OCRv4_server_rec, enhancing recognition capabilities for some traditional Chinese characters, Japanese, and special characters, supporting over 15,000+ characters. Besides improving document-related text recognition, it also enhances general text recognition capabilities | ||||
PP-OCRv4_mobile_rec | Inference Model/Training Model | -78.74 | +83.28 | 4.82 / 1.20 | 16.74 / 4.64 | -10.6 M | -The lightweight recognition model of PP-OCRv4 has high inference efficiency and can be deployed on various hardware devices, including edge devices. | +11 M | +PP-OCRv4 lightweight recognition model, with high inference efficiency, can be deployed on multiple hardware devices, including edge devices | |||||||||
PP-OCRv4_server_rec | Inference Model/Training Model | -80.61 | +85.19 | 6.58 / 2.43 | 33.17 / 33.17 | -71.2 M | -The server-side model of PP-OCRv4 offers high inference accuracy and can be deployed on various types of servers. | +87 M | +PP-OCRv4 server-side model, with high inference accuracy, can be deployed on various servers | |||||||||
PP-OCRv3_mobile_rec | Inference Model/Training Model | -72.96 | +PP-OCRv3_mobile_rec | Inference Model/Training Model | +75.43 | 5.87 / 1.19 | 9.07 / 4.28 | -9.2 M | -PP-OCRv3’s lightweight recognition model is designed for high inference efficiency and can be deployed on a variety of hardware devices, including edge devices. | +11 M | +PP-OCRv3 lightweight recognition model, with high inference efficiency, can be deployed on multiple hardware devices, including edge devices |
input
numpy.ndarray
/root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path)[numpy.ndarray, numpy.ndarray]
, [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]
, [\"/root/data1\", \"/root/data2\"]
/root/data/img.jpg
; URL link, e.g., network URL of image or PDF file: Example; Local directory, the directory should contain images to be predicted, e.g., local path: /root/data/
(currently does not support prediction of PDF files in directories; PDF files must be specified with a specific file path).
Python Var|str|list
str
save_path
None
, the inference results will not be saved locally.str
None
doc_orientation_classify_model_name
None
, the default model in pipeline will be used.str
None
doc_orientation_classify_model_dir
None
, the official model will be downloaded.str
None
doc_unwarping_model_name
None
, the default model in pipeline will be used.str
None
doc_unwarping_model_dir
None
, the official model will be downloaded.
+str
None
layout_detection_model_name
None
, the default model in pipeline will be used. str
None
layout_detection_model_dir
None
, the official model will be downloaded.
+str
None
seal_text_detection_model_name
None
, the production line's default model will be used.str
None
seal_text_detection_model_dir
None
, the official model will be downloaded.str
None
text_recognition_model_name
None
, the default pipeline model is used.str
None
text_recognition_model_dir
None
, the official model is downloaded.str
None
text_recognition_batch_size
None
, defaults to 1
.1
.int
None
use_doc_orientation_classify
None
, defaults to pipeline initialization value (True
).True
).bool
None
use_doc_unwarping
None
, defaults to pipeline initialization value (True
).True
).bool
None
use_layout_detection
None
, the parameter will default to the value initialized in the pipeline, which is True
.True
.
bool
None
layout_threshold
cls_id
and float values as thresholds. For example, {0: 0.45, 2: 0.48, 7: 0.4}
indicates applying a threshold of 0.45 for class ID 0, 0.48 for class ID 2, and 0.4 for class ID 70-1
. If not set, the default value is used, which is 0.5
.
float|dict
None
float
layout_nms
None
, the default configuration of the official model will be used.True
by default.bool
None
layout_unclip_ratio
0
. If not set, the default is 1.0
.
float|list
None
float
layout_merge_bboxes_mode
large
.
str
None
seal_det_limit_side_len
int|None
0
;None
, it will default to the value initialized by the pipeline, initialized to 960
;0
. If not set, the default is 736
.
None
int
seal_det_limit_type
str|None
min
and max
, where min
ensures that the shortest side of the image is not less than det_limit_side_len
, and max
ensures that the longest side of the image is not greater than limit_side_len
.None
, it will default to the value initialized by the pipeline, initialized to max
;min
and max
; min
ensures shortest side ≥ det_limit_side_len
, max
ensures longest side ≤ limit_side_len
. If not set, the default is min
.
None
str
seal_det_thresh
float|None
0
.None
, it will default to the value initialized by the pipeline, initialized to 0.3
.0
. If not set, the default is 0.2
.
None
float
seal_det_box_thresh
float|None
0
.None
, it will default to the value initialized by the pipeline, initialized to 0.6
.0
. If not set, the default is 0.6
.
None
float
seal_det_unclip_ratio
float|None
0
.None
, it will default to the value initialized by the pipeline, initialized to 2.0
.0
. If not set, the default is 0.5
.
None
float
seal_rec_score_thresh
float|None
0
.None
, it will default to the value initialized by the pipeline, initialized to 0.0
. I.e., no threshold is set.0
. If not set, the default is 0.0
(no threshold).
None
float
device
cpu
indicates using the CPU for inference.gpu:0
indicates using the first GPU for inference.None
, the pa
xpu:0
indicates using the first XPU for inference.mlu:0
indicates using the first MLU for inference.dcu:0
indicates using the first DCU for inference.None
, the parameter value initialized by the pipeline will be used by default. During initialization, the local GPU 0 device will be prioritized; if not available, the CPU device will be used.str
None
enable_hpi
None
, the pa
enable_mkldnn
None
, it will be enabled by default.bool
None
True
cpu_threads
None
, the pa
paddlex_config
str
None
None
, the pa
After running, the results will be printed to the terminal, as follows:
-pipeline |
-The name of the pipeline or the path to the pipeline configuration file. If it is a pipeline name, it must be supported by PaddleX. | +doc_orientation_classify_model_name |
+Name of the document orientation classification model. If set to None , the pipeline default model is used. |
str |
None |
||
config |
-Specific configuration information for the pipeline (if set simultaneously with pipeline , it has higher priority than pipeline , and the pipeline name must be consistent with pipeline ). |
-dict[str, Any] |
+doc_orientation_classify_model_dir |
+Directory path of the document orientation classification model. If set to None , the official model will be downloaded. |
+str |
None |
|
device |
-The device used for pipeline inference. It supports specifying the specific card number of the GPU, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu". Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to Pipeline Parallel Inference. | +doc_unwarping_model_name |
+Name of the document unwarping model. If set to None , the pipeline default model is used. |
+str |
+None |
+||
doc_unwarping_model_dir |
+Directory path of the document unwarping model. If set to None , the official model will be downloaded. |
str |
-gpu:0 |
+None |
+|||
layout_detection_model_name |
+Name of the layout detection model. If set to None , the pipeline default model is used. |
+str |
+None |
||||
use_hpip |
-Whether to enable the high-performance inference plugin. If set to None , the setting from the configuration file or config will be used. |
+layout_detection_model_dir |
+Directory path of the layout detection model. If set to None , the official model will be downloaded. |
+str |
+None |
+||
seal_text_detection_model_name |
+Name of the seal text detection model. If set to None , the default model will be used. |
+str |
++ | ||||
seal_text_detection_model_dir |
+Directory of the seal text detection model. If set to None , the official model will be downloaded. |
+str |
++ | ||||
text_recognition_model_name |
+Name of the text recognition model. If set to None , the pipeline default model is used. |
+str |
+None |
+||||
text_recognition_model_dir |
+Directory path of the text recognition model. If set to None , the official model will be downloaded. |
+str |
+None |
+||||
text_recognition_batch_size |
+Batch size for the text recognition model. If set to None , the default batch size is 1 . |
+int |
+None |
+||||
use_doc_orientation_classify |
+Whether to enable the document orientation classification module. If set to None , the default value is True . |
bool |
-None | None |
|||
hpi_config |
-High-performance inference configuration | -dict | None |
-None | +use_doc_unwarping |
+Whether to enable the document image unwarping module. If set to None , the default value is True . |
+bool |
None |
Parameter | -Description | -Type | -Options | -Default Value | +use_layout_detection |
+Whether to load and use the layout detection module. If set to None , the parameter will default to the value initialized in the pipeline, which is True . |
+bool |
+None |
---|---|---|---|---|---|---|---|---|
input |
-Data to be predicted, supports multiple input types (required) | -Python Var|str|list |
-+ | layout_threshold |
+Score threshold for the layout model.
|
-+ | float|dict |
+None |
device |
-Inference device for the pipeline | -str|None |
-+ | layout_nms |
+Whether to use Non-Maximum Suppression (NMS) as post-processing for layout detection. If set to None , the parameter will default to the value initialized in the pipeline, which is set to True by default. |
+bool |
+None |
+|
layout_unclip_ratio |
+Expansion ratio for the bounding boxes from the layout detection model.
|
+float|Tuple[float,float]|dict |
None |
|||||
use_doc_orientation_classify |
-Whether to use the document orientation classification module | -bool|None |
-+ | layout_merge_bboxes_mode |
+Filtering method for overlapping boxes in layout detection.
|
+str|dict |
None |
|
use_doc_unwarping |
-Whether to use the document unwarping module | -bool|None |
-+ | seal_det_limit_side_len |
+Image side length limit for seal text detection.
|
+int |
None |
|
use_layout_detection |
-Whether to use the layout detection module | -bool|None |
-+ | seal_det_limit_type |
+Limit type for seal text detection image side length.
|
+str |
None |
|
layout_threshold |
-Confidence threshold for layout detection; only scores above this threshold will be output | -float|dict|None |
-+ | seal_det_thresh |
+Pixel threshold for detection. Pixels with scores greater than this value in the probability map are considered text pixels.
|
+float |
None |
|
layout_nms |
-Whether to use Non-Maximum Suppression (NMS) for layout detection post-processing | -bool|None |
-+ | seal_det_box_thresh |
+Bounding box threshold. If the average score of all pixels inside a detection box exceeds this threshold, it is considered a text region.
|
+float |
None |
|
layout_unclip_ratio |
-Expansion ratio of detection box edges; if not specified, the default value from the PaddleX official model configuration will be used | -float|list|None |
-+ | seal_det_unclip_ratio |
+Expansion ratio for seal text detection. The larger the value, the larger the expanded area.
|
+float |
+None |
|
layout_merge_bboxes_mode |
-Merging mode for detection boxes in layout detection output; if not specified, the default value from the PaddleX official model configuration will be used | -string|None |
-+ | seal_rec_score_thresh |
+Score threshold for seal text recognition. Text results with scores above this threshold will be retained.
|
-None | +float |
+None |
seal_det_limit_side_len |
-Side length limit for seal text detection | -int|None |
-+ | device |
+Device used for inference. Supports specifying device ID:
|
+str |
None |
|
seal_rec_score_thresh |
-Text recognition threshold; text results with scores above this threshold will be retained | -float|None |
-+ | enable_hpi |
+Whether to enable high-performance inference. | +bool |
+False |
+|
use_tensorrt |
+Whether to use TensorRT for accelerated inference. | +bool |
+False |
+|||||
min_subgraph_size |
+Minimum subgraph size used to optimize model subgraph computation. | +int |
+3 |
+|||||
precision |
+Computation precision, e.g., fp32, fp16. | +str |
+"fp32" |
+|||||
enable_mkldnn |
+Whether to enable MKL-DNN acceleration for inference. If MKL-DNN is unavailable or the model does not support it, acceleration will not be used even if this flag is set. | +bool |
+True |
+|||||
cpu_threads |
+Number of threads used for inference on CPU. | +int |
+8 |
+|||||
paddlex_config |
+Path to the PaddleX pipeline configuration file. | +str |
+None |
+
Parameter | +Parameter Description | +Parameter Type | +Default Value | +
---|---|---|---|
input |
+Input data to be predicted. Required. Supports multiple types:
|
+Python Var|str|list |
++ |
use_doc_orientation_classify |
+Whether to use the document orientation classification module during inference. | +bool |
+None |
+
use_doc_unwarping |
+Whether to use the text image correction module during inference. | +bool |
+None |
+
use_layout_detection |
++Whether to use the layout detection module during inference. | +bool |
+None |
+
layout_threshold |
+Same as the parameter during instantiation. | +float|dict |
+None |
+
layout_nms |
+Same as the parameter during instantiation. | +bool |
+None |
+
layout_unclip_ratio |
+Same as the parameter during instantiation. | +float|Tuple[float,float]|dict |
+None |
+
layout_merge_bboxes_mode |
+Same as the parameter during instantiation. | +str|dict |
+None |
+
seal_det_limit_side_len |
+Same as the parameter during instantiation. | +int |
+None |
+
seal_det_limit_type |
+Same as the parameter during instantiation. | +str |
+None |
+
seal_det_thresh |
+Same as the parameter during instantiation. | +float |
+None |
+
seal_det_box_thresh |
+Same as the parameter during instantiation. | +float |
+None |
+
seal_det_unclip_ratio |
+Same as the parameter during instantiation. | +float |
+None |
+
seal_rec_score_thresh |
+Same as the parameter during instantiation. | +float |
None |
format_json
bool
JSON
indentationJSON
indentation.True
indent
int
JSON
data for better readability, effective only when format_json
is True
JSON
data for better readability, effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
.False
save_path
str
indent
int
JSON
data for better readability, effective only when format_json
is True
JSON
data for better readability, effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
will retain the original characters, effective only when format_json
is True
.False
save_path
str
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
Python Var|str|list
str
save_path
None
, 推理结果将不会保存到本地。str
None
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。str
None
layout_detection_model_name
None
, 将会使用产线默认模型。str
None
layout_detection_model_dir
None
, 将会下载官方模型。str
None
seal_text_detection_model_name
None
, 将会使用产线默认模型。str
None
seal_text_detection_model_dir
None
, 将会下载官方模型。str
None
text_recognition_model_name
None
, 将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。1
。int
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_layout_detection
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
layout_threshold
0
的任意浮点数
- 0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0-1
之间的任意浮点数。如果不设置,将默认使用产线初始化的该参数值,初始化为 0.5
。
float|dict
None
float
layout_nms
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
layout_unclip_ratio
None
, 将默认使用产线初始化的该参数值,初始化为1.0float|list
None
float
layout_merge_bboxes_mode
None
, 将默认使用产线初始化的该参数值,初始化为large
large
。
str
None
seal_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 736
。
int
None
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 min
。
str
None
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.2
。
float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.5
。
float
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;str
None
enable_hpi
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
paddlex_config
str
None
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
layout_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
layout_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
seal_text_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
seal_text_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_recognition_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。None
,将默认设置批处理大小为1
。int
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_layout_detection
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
layout_threshold
0
的任意浮点数
- 0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数;
+0
的任意浮点数;
+None
,将默认使用产线初始化的该参数值 0.5
。float|dict
None
layout_nms
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
layout_unclip_ratio
None
, 将默认使用产线初始化的该参数值,初始化为1.00
浮点数;cls_id
, value为tuple类型,如{0: (1.1, 2.0)}
,表示将模型输出的第0类别检测框中心不变,宽度扩张1.1倍,高度扩张2.0倍;None
,将默认使用产线初始化的该参数值,初始化为 1.0
。float|list
float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
None
, 将默认使用产线初始化的该参数值,初始化为large
large
,small
,union
,分别表示重叠框过滤时选择保留大框,小框还是同时保留;cls_id
,value为str类型,如{0: "large", 2: "small"}
,表示对第0类别检测框使用large模式,对第2类别检测框使用small模式;None
,将默认使用产线初始化的该参数值,初始化为 large
。str
str|dict
None
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 736
;None
,将默认使用产线初始化的该参数值,初始化为 736
。int
seal_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 min
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 min
。str
seal_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.2
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.2
。float
None
seal_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.6
。float
None
seal_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.5
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.5
。float
None
seal_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;None
,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。str
precision
str
fp32
"fp32"
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据;/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]。
Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
layout_unclip_ratio
float
float|Tuple[float,float]|dict
None
layout_merge_bboxes_mode
string
str|dict
None
format_json
bool
JSON
缩进格式化JSON
缩进格式化。True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
input
numpy.ndarray
./root/data/img.jpg
; as URL links, such as network URLs for image files or PDF files: example; as local directories, the directory must contain images to be predicted, such as local path: /root/data/
(currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path).[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
./root/data/img.jpg
; as URL links, such as network URLs for image files or PDF files: example; as local directories, the directory must contain images to be predicted, such as local path: /root/data/
(currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path).
Python Var|str|list
str
save_path
None
, the inference result will not be saved locally.str
None
layout_detection_model_name
None
, the default model of the pipeline will be used.str
None
layout_detection_model_dir
None
, the official model will be downloaded.str
None
table_classification_model_name
None
, the default model of the pipeline will be used.str
None
table_classification_model_dir
None
, the official model will be downloaded.str
None
wired_table_structure_recognition_model_name
None
, the default model of the pipeline will be used.str
None
wired_table_structure_recognition_model_dir
None
, the official model will be downloaded.str
None
wireless_table_structure_recognition_model_name
None
, the default model of the pipeline will be used.str
None
wireless_table_structure_recognition_model_dir
None
, the official model will be downloaded.str
None
wired_table_cells_detection_model_name
None
, the default model of the pipeline will be used.str
None
wired_table_cells_detection_model_dir
None
, the official model will be downloaded.str
None
wireless_table_cells_detection_model_name
None
, the default model of the pipeline will be used.str
None
wireless_table_cells_detection_model_dir
None
, the official model will be downloaded.str
None
doc_orientation_classify_model_name
None
, the default model of the pipeline will be used.str
None
doc_orientation_classify_model_dir
None
, the official model will be downloaded.str
None
doc_unwarping_model_name
None
, the default model of the pipeline will be used.str
None
doc_unwarping_model_dir
None
, the official model will be downloaded.str
None
text_detection_model_name
None
, the default model of the pipeline will be used.str
None
text_detection_model_dir
None
, the official model will be downloaded.str
None
text_det_limit_side_len
0
;None
, the default value initialized by the pipeline will be used, initialized to 960
;0
. If not set, the default value initialized by the pipeline will be used, initialized to 960
.
int
None
text_det_limit_type
min
and max
. min
ensures that the shortest side of the image is not less than det_limit_side_len
, while max
ensures that the longest side of the image is not greater than limit_side_len
.None
, the default value initialized by the pipeline will be used, initialized to max
;min
and max
. min
ensures that the shortest side of the image is not less than det_limit_side_len
, while max
ensures that the longest side of the image is not greater than limit_side_len
. If not set, the default value initialized by the pipeline will be used, initialized to max
.
str
None
text_det_thresh
0
.None
, the default value initialized by the pipeline will be used, which is 0.3
.0
. If not set, the default value initialized by the pipeline will be used, which is 0.3
.
float
None
text_det_box_thresh
0
.None
, the default value initialized by the pipeline will be used, which is 0.6
.0
. If not set, the default value initialized by the pipeline will be used, which is 0.6
.
float
None
text_det_unclip_ratio
0
.None
, the default value initialized by the pipeline will be used, which is 2.0
.0
. If not set, the default value initialized by the pipeline will be used, which is 2.0
.
float
None
text_recognition_model_name
None
, the default model of the pipeline will be used.str
None
text_recognition_model_dir
None
, the official model will be downloaded.str
None
text_recognition_batch_size
None
, the default batch size will be set to 1
.1
.int
None
text_rec_score_thresh
0
.None
, the default value initialized by the pipeline will be used, which is 0.0
. That is, no threshold is set.0
. If not set, the default value initialized by the pipeline will be used, which is 0.0
. That is, no threshold is set.
float
None
use_doc_orientation_classify
None
, the default value initialized by the pipeline will be used, initialized to True
.True
.bool
None
use_doc_unwarping
None
, the default value initialized by the pipeline will be used, initialized to True
.True
.bool
None
use_layout_detection
None
, the default value initialized by the pipeline will be used, initialized to True
.True
.bool
None
use_ocr_model
None
, the default value initialized by the pipeline will be used, initialized to True
.True
.bool
None
device
cpu
indicates using CPU for inference;gpu:0
indicates using the first GPU for inference;xpu:0
indicates using the first XPU for inference;mlu:0
indicates using the first MLU for inference;dcu:0
indicates using the first DCU for inference;None
, the default value initialized by the pipeline will be used, which prioritizes using the local GPU device 0; if not available, it will use the CPU device.str
None
enable_hpi
enable_mkldnn
None
, it will be enabled by default.bool
None
True
cpu_threads
paddlex_config
str
None
部门 | 报销人 | 报销事由 | 批准人: | ||||
单据 张 | |||||||
合计金额 元 | |||||||
其 中 | 车费票 | ||||||
火车费票 | |||||||
飞机票 | |||||||
旅住宿费 | |||||||
其他 | |||||||
补贴 |
0
;None
, the default value initialized by the pipeline will be used, initialized to 960
;None
, the default value initialized by the pipeline will be used, initialized to 960
.int
text_det_limit_type
min
and max
. min
ensures that the shortest side of the image is not less than det_limit_side_len
, while max
ensures that the longest side of the image is not greater than limit_side_len
.None
, the default value initialized by the pipeline will be used, initialized to max
;min
and max
. min
ensures that the shortest side of the image is not less than det_limit_side_len
, while max
ensures that the longest side of the image is not greater than limit_side_len
;None
, the default value initialized by the pipeline will be used, initialized to max
.str
text_det_thresh
0
.0
;None
, the default value initialized by the pipeline will be used, which is 0.3
.text_det_box_thresh
0
.0
;None
, the default value initialized by the pipeline will be used, which is 0.6
.text_det_unclip_ratio
0
.0
;None
, the default value initialized by the pipeline will be used, which is 2.0
.text_rec_score_thresh
0
.None
, the default value initialized by the pipeline will be used, which is 0.0
. That is, no threshold is set.0
;None
, the default value initialized by the pipeline will be used, which is 0.0
. That is, no threshold is set.
+
float
None
use_doc_orientation_classify
None
, the default value initialized by the pipeline will be used, initialized to True
.None
, the default value initialized by the pipeline will be used, initialized to True
.bool
None
use_doc_unwarping
None
, the default value initialized by the pipeline will be used, initialized to True
.None
, the default value initialized by the pipeline will be used, initialized to True
.bool
None
use_layout_detection
None
, the default value initialized by the pipeline will be used, initialized to True
.None
, the default value initialized by the pipeline will be used, initialized to True
.bool
None
use_ocr_model
None
, the default value initialized by the pipeline will be used, initialized to True
.None
, the default value initialized by the pipeline will be used, initialized to True
.bool
None
device
cpu
indicates using CPU for inference;gpu:0
indicates using the first GPU for inference;xpu:0
indicates using the first XPU for inference;mlu:0
indicates using the first MLU for inference;dcu:0
indicates using the first DCU for inference;None
, the default value initialized by the pipeline will be used, which prioritizes using the local GPU device 0; if not available, it will use the CPU device.None
, the pipeline initialized value for this parameter will be used. During initialization, the local GPU device 0 will be preferred; if unavailable, the CPU device will be used.str
precision
str
fp32
"fp32"
enable_mkldnn
None
, it will be enabled by default.bool
None
True
cpu_threads
input
numpy.ndarray
./root/data/img.jpg
; as URL links, such as network URLs for image files or PDF files: example; as local directories, the directory must contain images to be predicted, such as local path: /root/data/
(currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path).numpy.ndarray
;/root/data/img.jpg
; as URL links, such as network URLs for image files or PDF files: example; as local directories, the directory must contain images to be predicted, such as local path: /root/data/
(currently, predictions do not support directories that contain PDF files; the PDF file must be specified to the specific file path);[numpy.ndarray, numpy.ndarray]
, ["/root/data/img1.jpg", "/root/data/img2.jpg"]
, ["/root/data1", "/root/data2"]
.device
str
None
use_doc_orientation_classify
bool
format_json
bool
JSON
indentationJSON
indentation.True
indent
int
JSON
data, making it more readable. Effective only when format_json
is True
JSON
data, making it more readable. Effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
keeps the original characters. Effective only when format_json
is True
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
keeps the original characters. Effective only when format_json
is True
.False
indent
int
JSON
data, making it more readable. Effective only when format_json
is True
JSON
data, making it more readable. Effective only when format_json
is True
.ensure_ascii
bool
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
keeps the original characters. Effective only when format_json
is True
ASCII
characters to Unicode
. When set to True
, all non-ASCII
characters will be escaped; False
keeps the original characters. Effective only when format_json
is True
.False
save_path
str
save_path
str
save_path
str
layoutThreshold
number
| null
layout_threshold
parameter description in the predict
method of the model object.layoutNms
boolean
| null
layout_nms
parameter description in the predict
method of the model object.layoutUnclipRatio
number
| array
| null
layout_unclip_ratio
parameter description in the predict
method of the model object.layoutMergeBboxesMode
string
| null
layout_merge_bboxes_mode
parameter description in the predict
method of the model object.textDetLimitSideLen
integer
| null
text_det_limit_side_len
parameter description in the predict
method of the model object./root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)。
Python Var|str|list
str
save_path
None
, 推理结果将不会保存到本地。str
None
layout_detection_model_name
None
, 将会使用产线默认模型。str
None
layout_detection_model_dir
None
, 将会下载官方模型。str
None
table_classification_model_name
None
, 将会使用产线默认模型。str
None
table_classification_model_dir
None
, 将会下载官方模型。str
None
wired_table_structure_recognition_model_name
None
, 将会使用产线默认模型。str
None
wired_table_structure_recognition_model_dir
None
, 将会下载官方模型。str
None
wireless_table_structure_recognition_model_name
None
, 将会使用产线默认模型。str
None
wireless_table_structure_recognition_model_dir
None
, 将会下载官方模型。str
None
wired_table_cells_detection_model_name
None
, 将会使用产线默认模型。str
None
wired_table_cells_detection_model_dir
None
, 将会下载官方模型。str
None
wireless_table_cells_detection_model_name
None
, 将会使用产线默认模型。str
None
wireless_table_cells_detection_model_dir
None
, 将会下载官方模型。str
None
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。str
None
text_detection_model_name
None
, 将会使用产线默认模型。str
None
text_detection_model_dir
None
, 将会下载官方模型。str
None
text_det_limit_side_len
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;0
的任意整数。如果不设置,将默认使用产线初始化的该参数值,初始化为 960
。
int
None
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
。如果不设置,将默认使用产线初始化的该参数值,初始化为 max
。
str
None
text_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.3
。
float
None
text_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.6
。
float
None
text_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 2.0
。
float
None
text_recognition_model_name
None
, 将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。1
。int
None
text_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数
+。如果不设置,将默认使用产线初始化的该参数值 0.0
。即不设阈值。
float
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_layout_detection
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
use_ocr_model
None
, 将默认使用产线初始化的该参数值,初始化为True
。True
。bool
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;str
None
enable_hpi
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
paddlex_config
str
None
部门 | 报销人 | 报销事由 | 批准人: | ||||
单据 张 | |||||||
合计金额 元 | |||||||
其 中 | 车费票 | ||||||
火车费票 | |||||||
飞机票 | |||||||
旅住宿费 | |||||||
其他 | |||||||
补贴 |
layout_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
layout_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
table_classification_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
table_classification_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
wired_table_structure_recognition_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
wired_table_structure_recognition_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
wireless_table_structure_recognition_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
wireless_table_structure_recognition_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
wired_table_cells_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
wired_table_cells_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
wireless_table_cells_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
wireless_table_cells_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
doc_orientation_classify_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_orientation_classify_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
doc_unwarping_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
doc_unwarping_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_detection_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
text_detection_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
0
的任意整数;None
, 将默认使用产线初始化的该参数值,初始化为 960
;None
,将默认使用产线初始化的该参数值,初始化为 960
。int
text_det_limit_type
min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
None
, 将默认使用产线初始化的该参数值,初始化为 max
;min
和 max
,min
表示保证图像最短边不小于 det_limit_side_len
,max
表示保证图像最长边不大于 limit_side_len
;None
,将默认使用产线初始化的该参数值,初始化为 max
。str
text_det_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.3
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.3
。float
None
text_det_box_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.6
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.6
。float
None
text_det_unclip_ratio
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 2.0
0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 2.0
。float
None
text_recognition_model_name
None
, 将会使用产线默认模型。None
,将会使用产线默认模型。str
None
text_recognition_model_dir
None
, 将会下载官方模型。None
,将会下载官方模型。str
None
text_recognition_batch_size
None
, 将默认设置批处理大小为1
。None
,将默认设置批处理大小为1
。int
None
text_rec_score_thresh
0
的任意浮点数
- None
, 将默认使用产线初始化的该参数值 0.0
。即不设阈值0
的任意浮点数;
+ None
,将默认使用产线初始化的该参数值 0.0
,即不设阈值。float
None
use_doc_orientation_classify
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_doc_unwarping
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_layout_detection
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
use_ocr_model
None
, 将默认使用产线初始化的该参数值,初始化为True
。None
,将默认使用产线初始化的该参数值,初始化为True
。bool
None
device
cpu
表示使用 CPU 进行推理;gpu:0
表示使用第 1 块 GPU 进行推理;xpu:0
表示使用第 1 块 XPU 进行推理;mlu:0
表示使用第 1 块 MLU 进行推理;dcu:0
表示使用第 1 块 DCU 进行推理;None
, 将默认使用产线初始化的该参数值,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备;None
,初始化时,会优先使用本地的 GPU 0号设备,如果没有,则使用 CPU 设备。str
precision
str
fp32
"fp32"
enable_mkldnn
None
, 将默认启用。
+bool
None
True
cpu_threads
input
numpy.ndarray
表示的图像数据/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径)[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]
numpy.ndarray
表示的图像数据;/root/data/img.jpg
;如URL链接,如图像文件或PDF文件的网络URL:示例;如本地目录,该目录下需包含待预测图像,如本地路径:/root/data/
(当前不支持目录中包含PDF文件的预测,PDF文件需要指定到具体文件路径);[numpy.ndarray, numpy.ndarray]
,["/root/data/img1.jpg", "/root/data/img2.jpg"]
,["/root/data1", "/root/data2"]。
Python Var|str|list
device
str
None
use_doc_orientation_classify
bool
use_wired_table_cells_trans_to_html
bool
False
format_json
bool
JSON
缩进格式化JSON
缩进格式化。True
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
indent
int
JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效JSON
数据,使其更具可读性,仅当 format_json
为 True
时有效。ensure_ascii
bool
ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效ASCII
字符转义为 Unicode
。设置为 True
时,所有非 ASCII
字符将被转义;False
则保留原始字符,仅当format_json
为True
时有效。False
save_path
str
save_path
str
save_path
str
layoutThreshold
number
| null
predict
方法的 layout_threshold
参数相关说明。layoutNms
boolean
| null
predict
方法的 layout_nms
参数相关说明。layoutUnclipRatio
number
| array
| null
predict
方法的 layout_unclip_ratio
参数相关说明。layoutMergeBboxesMode
string
| null
predict
方法的 layout_merge_bboxes_mode
参数相关说明。textDetLimitSideLen
integer
| null
predict
方法的 text_det_limit_side_len
参数相关说明。