You have a scanned document, a screenshot of a table, or a photo of a whiteboard covered in handwriting. You need the text out of it and into a format you can edit, search, or process. That's exactly what OCR does — and modern tools have made it remarkably accurate.
What Is OCR?
OCR stands for Optical Character Recognition. It's the technology that converts images containing text into machine-readable, editable text. A scanner plus OCR software turns a paper document into a Word file. A smartphone camera app with OCR can translate signs in real time.
OCR works by:
- Preprocessing the image (adjusting contrast, deskewing, removing noise)
- Detecting text regions in the image
- Segmenting detected regions into individual characters or word groups
- Recognizing each character or word using trained models
- Assembling the output into structured text
Modern OCR uses deep learning models — specifically convolutional neural networks — trained on millions of document images. This is why accuracy has improved dramatically in the past decade.
When OCR Works Well
Printed text on clean backgrounds. High-contrast, printed documents — PDFs, scanned books, typed letters — are OCR's sweet spot. Accuracy on cleanly printed text is routinely 99%+.
Standard fonts and sizes. OCR is trained on common typefaces. A standard 12pt Times New Roman or Arial body text is recognized almost perfectly.
Digital PDFs converted to images. If you screenshotted a native PDF, the text is already pixel-perfect and OCR results are excellent.
Forms and structured documents. Tables, invoices, and forms with clear grid lines or labels are recognized well and can often be extracted into structured data.
When OCR Struggles
Handwriting. Handwriting recognition (ICR — Intelligent Character Recognition) is much harder than printed text. Modern AI has improved it substantially, but accuracy varies with handwriting legibility. Cursive is harder than print. Messy handwriting may produce errors.
Low-resolution images. Text must be at least 150 DPI for acceptable OCR results. 300 DPI is the recommended minimum for high-accuracy results. Mobile photos taken in bad lighting may be too noisy or blurry.
Unusual fonts. Decorative fonts, script fonts, and stylized logos confuse OCR models trained on standard typefaces.
Complex layouts. Multi-column text, text wrapped around images, rotated text, and text in unusual reading directions can disrupt OCR's segmentation step.
Text in images. OCR can read text overlaid on a photo background (like a meme or social graphic), but accuracy drops if there's high contrast variation behind the characters.
How to Convert an Image to Text
Use DevZone's Image to Text Tool to extract text from any image in your browser:
- Upload your image (JPEG, PNG, HEIC, TIFF, or BMP).
- The tool processes the image with OCR technology.
- Copy the extracted text or download it as a plain text file.
For best results, upload a high-resolution image with good contrast between the text and background.
Getting the Best OCR Results
Scan at 300 DPI or higher. Most consumer scanners default to 150 or 200 DPI. For OCR, use 300 DPI at minimum.
Use grayscale or black-and-white. Color images are processed correctly, but grayscale images process faster and are sometimes more accurate for simple documents.
Crop tightly around text. Remove large margins or borders that don't contain text — they add processing time without improving accuracy.
Correct the orientation. OCR works best on level text. If your image is rotated 90° or skewed, correct it before processing. Most tools do basic deskewing automatically.
Clean up noise. Smudges, stamps, or other marks on a document can confuse character recognition. If possible, reduce noise in an image editor before OCR.
OCR Output Formats
Depending on the tool, OCR output can be:
- Plain text — all recognized text, in reading order, without formatting. Best for editing and searching.
- Searchable PDF — the original image with a text layer underneath. The image looks the same, but the text is now selectable and searchable. Best for archiving.
- Word document — attempts to preserve formatting (headings, tables, columns). Quality varies by source document complexity.
- Structured data (JSON/CSV) — for forms and tables, some tools extract key-value pairs or table rows.
OCR in Code
Python (using Tesseract):
import pytesseract
from PIL import Image
image = Image.open("document.png")
text = pytesseract.image_to_string(image)
print(text)
Tesseract is Google's open-source OCR engine, widely used and free. Install with brew install tesseract (macOS) or sudo apt install tesseract-ocr (Ubuntu).
JavaScript (in Node.js with Tesseract.js):
const Tesseract = require("tesseract.js");
const { data: { text } } = await Tesseract.recognize("document.png", "eng");
console.log(text);
For cloud-based OCR at scale, Google Cloud Vision API, AWS Textract, and Azure Computer Vision all provide OCR-as-a-service with REST APIs.
FAQ
How accurate is OCR?
For cleanly printed text at adequate resolution, modern OCR achieves 98–99%+ character accuracy. For difficult inputs (handwriting, low resolution, unusual fonts), accuracy can drop to 70–90% or lower. For mission-critical use, always review the output.
Can OCR read tables and forms?
Yes, with caveats. Simple tables with clear borders are handled well. Complex tables with merged cells, nested tables, or decorative borders are harder. Dedicated document AI tools (AWS Textract, Google Document AI) are specialized for structured document extraction.
Does OCR work for photos of text (not scans)?
Yes, but quality depends heavily on the photo. A sharp, well-lit photo taken straight-on works almost as well as a scan. A photo taken at an angle, with shadows, or with motion blur will have significantly more errors.
Is OCR the same as copy-pasting from a PDF?
No. Copy-pasting from a native PDF extracts the actual stored text — this is fast and perfectly accurate. OCR is needed when the PDF is a scanned image (no embedded text). You can tell the difference by trying to select text in the PDF: if you can select individual characters, it has embedded text.