ChatGPT 能执行 OCR 吗?
我一直在探索用于从图像中提取文本的 AI 工具,很好奇 ChatGPT 是否能直接执行 OCR(光学字符识别)。ChatGPT 能否读取并从图像中提取文本,还是仅限于基于文本的输入?如果 ChatGPT 能执行 OCR,其准确率与专用 OCR 软件相比如何?期待您的见解!
Eli Webster
March 9, 2026 at 11:03 PM
我一直在探索用于从图像中提取文本的 AI 工具,很好奇 ChatGPT 是否能直接执行 OCR(光学字符识别)。ChatGPT 能否读取并从图像中提取文本,还是仅限于基于文本的输入?如果 ChatGPT 能执行 OCR,其准确率与专用 OCR 软件相比如何?期待您的见解!
添加评论
评论 (11)
If you just want to extract text from images, I recommend using Tesseract. It's open source and pretty accurate for printed text.
Be aware that GPT models with image understanding are not generally available to the public; usually, only research previews or limited access is given. So you might not have direct access to the OCR feature within ChatGPT.
Is there any open source model that combines OCR and language understanding like ChatGPT?
I tried putting images into ChatGPT, but it doesn't accept image uploads in the regular interface. Maybe in specialized versions?
The newest versions of ChatGPT Plus with GPT-4 have an image input feature, but it's still limited and primarily experimental.
The bottom line: ChatGPT is great for text understanding and generation, but for OCR tasks, you should rely on dedicated OCR technology.
Actually, OpenAI's GPT-4 can accept image inputs in some versions, and it can perform simple text reading from images, but it's not designed to replace dedicated OCR software. Its OCR ability is limited and may not work well with complex or low-quality images.
ChatGPT itself does not perform OCR. It's primarily designed for processing and generating text. However, OpenAI has other models like CLIP and the image recognition model that can analyze images, but for OCR, specialized tools like Tesseract or Google Vision API are more suitable.
Google Cloud Vision API provides powerful OCR capabilities and supports multiple languages. It's a good option if you need scalable OCR services.
I've tried uploading screenshots to GPT-4 with image input, and it can read text from the images quite well. But it's still better to use dedicated OCR tools if you need bulk processing or high accuracy.
For my project, I combined Tesseract OCR with ChatGPT. First, I use Tesseract to extract text, then I feed that text into ChatGPT for summarization or analysis. It works great!