README.md · CV_Lab/OCR-Translate

基于Tesseract的OCR翻译系统

🚀 作者简介

曾逸夫，从事人工智能研究与开发；主研领域：计算机视觉；YOLOv5官方开源项目代码贡献人；YOLOv5 v6.1代码贡献人；Gradio官方开源项目代码贡献人

❤️ Github：https://github.com/Zengyf-CVer

🔥 YOLOv5 官方开源项目PR ID：

Save *.npy features on detect.py --visualize：https://github.com/ultralytics/yolov5/pull/5701
Fix detect.py --view-img for non-ASCII paths：https://github.com/ultralytics/yolov5/pull/7093
Fix Flask REST API：https://github.com/ultralytics/yolov5/pull/7210
Add yesqa to precommit checks：https://github.com/ultralytics/yolov5/pull/7511
Add mdformat to precommit checks and update other version：https://github.com/ultralytics/yolov5/pull/7529
Add TensorRT dependencies：https://github.com/ultralytics/yolov5/pull/8553

💡 YOLOv5 v6.1代码贡献链接：

https://github.com/ultralytics/yolov5/releases/tag/v6.1

🔥 Gradio 官方开源项目PR ID：

Create a color generator demo：https://github.com/gradio-app/gradio/pull/1872

🚀更新走势

2022-07-19 ⚡ OCR Translate v0.2正式上线
2022-06-19 ⚡ OCR Translate v0.1正式上线

🤗在线Demo

❤️ 快速体验

本项目提供了在线demo，点击下面的logo，进入Hugging Face Spaces中快速体验：

💡 Demo 列表

❤️ 点击列表中的链接，进入对应版本的Hugging Face Spaces界面中快速体验：

Demo 名称	输入类型	输出类型	状态
🚀 OCR Translate v0.2	图片	文本

💎项目流程与用途

📌 项目整体流程

📌 项目示例

💡 OCR 文字提取（中文/英文）

💡 翻译（中文-英文/英文-中文）

💡项目结构

.
├── ocr-translate							# 项目名称
│   ├── opus-mt-en-zh						# Opus-MT翻译包
│   │   ├── config.json						# 配置文件
│   │   ├── flax_model.msgpack				# Flax模型
│   │   ├── metadata.bin					# PyTorch模型
│   │   ├── rust_model.ot					# Rust模型
│   │   ├── tf_model.h5						# TensorFlow模型
│   │   ├── tokenizer_config.json			# tokenizer配置
│   │   ├── ......							# 其他
│   ├── data								# 示例图片
│   ├── __init__.py							# 初始化文件
│   ├── ocr_translate.py					# 主运行文件
│   ├── LICENSE								# 项目许可
│   ├── CodeCheck.md						# 代码检查
│   ├── .gitignore							# git忽略文件
│   ├── README.md							# 项目说明
│   ├── setup.cfg							# pre-commit CI检查源配置文件
│   ├── .pre-commit-config.yaml				# pre-commit配置文件
│   └── requirements.txt					# 脚本依赖包

🔥安装教程

✅ 第一步：安装Tesseract OCR及其语言包（Ubuntu版）

📌 安装Tesseract OCR

# 加入Tesseract OCR apt repo
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
# 更新apt
sudo apt update
# 安装
sudo apt install tesseract-ocr

📌 安装Tesseract OCR语言包

git clone https://github.com/tesseract-ocr/tessdata

# 加入环境变量
sudo vim ~/.bashrc
export TESSDATA_PREFIX=/home/zyf/tessdata

# 使环境变量生效
source ~/.bashrc

# 注：将script目录中的文件移动到tessdata根目录

✅ 第二步：创建conda环境

conda create -n ocr python==3.8
conda activate ocr # 进入环境

✅ 第三步：克隆

git clone https://gitee.com/CV_Lab/ocr-translate.git

✅ 第四步：安装OCR Translate依赖

cd ocr-translate
pip install -r ./requirements.txt -U

✅ 第五步：安装 Opus-MT 翻译包（离线版）

cd ocr-translate

# 中文-英文
git lfs clone https://huggingface.co/Helsinki-NLP/opus-mt-zh-en

# 英文-中文
git lfs clone https://huggingface.co/Helsinki-NLP/opus-mt-en-zh

⚡使用教程

python ocr_translate_v2.py # v0.2 推荐
python ocr_translate_v1.py # v0.1

💬 技术交流

如果你发现任何OCR Translate存在的问题或者是建议, 欢迎通过Gitee Issues给我提issues。
欢迎加入CV Lab技术交流群

CV_Lab / OCR-Translate

🚀 作者简介

🚀更新走势

🤗在线Demo

❤️ 快速体验

💡 Demo 列表

💎项目流程与用途

📌 项目整体流程

📌 项目示例

💡 OCR 文字提取（中文/英文）

💡 翻译（中文-英文/英文-中文）

💡项目结构

🔥安装教程

✅ 第一步：安装Tesseract OCR及其语言包（Ubuntu版）

📌 安装Tesseract OCR

📌 安装Tesseract OCR语言包

✅ 第二步：创建conda环境

✅ 第三步：克隆

✅ 第四步：安装OCR Translate依赖

✅ 第五步：安装 Opus-MT 翻译包（离线版）

⚡使用教程

💬 技术交流

简介

发行版 (2)

贡献者

近期动态

CV_Lab / OCR-Translate .gitee-modal { width: 500px !important; }

🚀 作者简介

🚀更新走势

🤗在线Demo

❤️ 快速体验

💡 Demo 列表

💎项目流程与用途

📌 项目整体流程

📌 项目示例

💡 OCR 文字提取（中文/英文）

💡 翻译（中文-英文/英文-中文）

💡项目结构

🔥安装教程

✅ 第一步：安装Tesseract OCR及其语言包（Ubuntu版）

📌 安装Tesseract OCR

📌 安装Tesseract OCR语言包

✅ 第二步：创建conda环境

✅ 第三步：克隆

✅ 第四步：安装OCR Translate依赖

✅ 第五步：安装 Opus-MT 翻译包（离线版）

⚡使用教程

💬 技术交流

简介

发行版 (2)

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

CV_Lab / OCR-Translate