Categories
Cloud python Software Development

Rolling Your Own Serverless OCR in 40 Lines of Code

Learn to deploy your own serverless OCR in under 40 lines of Python using Modal and Deepseek OCR for efficient text extraction.

If you’re tired of paying for expensive OCR SaaS or wrestling with heavyweight open source solutions, you can deploy your own serverless OCR in less than an hour—and in under 40 lines of Python. In this guide, you’ll see exactly how to do it using Modal, a serverless compute platform, and Deepseek OCR. The result: fast, pay-per-request text extraction from images or PDFs, with zero infrastructure maintenance.

Key Takeaways:

  • How to deploy serverless OCR in under 40 lines of Python using Modal and Deepseek OCR
  • Understand the trade-offs and costs of serverless versus self-hosted OCR
  • Practical, production-ready code for fast text extraction from images
  • Real-world pitfalls in serverless OCR deployments—and how to avoid them

Why Serverless OCR?

Optical Character Recognition (OCR) transforms scanned documents and images into machine-readable text. Traditionally, you had to choose between:

  • SaaS OCR platforms (expensive, privacy trade-offs, vendor lock-in)
  • Self-hosted Tesseract or similar tools (complex setup, scaling headaches)

Serverless OCR combines the flexibility of open source with the scalability and simplicity of cloud functions. Modal, for example, lets you:

  • Run arbitrary Python in isolated containers
  • Attach GPUs for heavy workloads (not required for most OCR jobs)
  • Pay only for the seconds your code runs (source)
  • Deploy with a few Python decorators—no Dockerfiles or YAML needed

This approach is ideal for batch processing scanned archives, automating document ingestion, or building internal tools. If you’re interested in broader application modernization, check out using go fix to modernize Go code.

ApproachSetup TimeScalingPrivacyTypical Cost
SaaS OCR APIMinutesAutoShared cloud$$$ per 1k pages
Self-hosted (Tesseract)Hours/DaysManualYour infra$ (infra only)
Serverless (Modal + Deepseek)<1 hourAutoControlPay-per-execution

Prerequisites

  • Python 3.9+ installed (python --version)
  • Basic familiarity with Python functions and virtual environments
  • A free Modal account (sign up required)
  • Modal Python SDK (pip install modal)
  • Deepseek OCR model from Hugging Face (pip install deepseek-ocr)
  • Test image or PDF file with clear text

No prior experience with serverless or containerization is required—Modal abstracts this away.

Building Serverless OCR in 40 Lines

The core of our solution is a Python function, decorated with Modal’s @stub.function, which loads a pre-trained model and extracts text from images. Here’s the complete code:

# serverless_ocr.py
# Requires: pip install modal deepseek-ocr Pillow

import modal
from PIL import Image
from deepseek_ocr import OCR

stub = modal.Stub("serverless-ocr-demo")

# Download model weights once per container
def download_model():
    model = OCR.from_pretrained("deepseek-ai/deepseek-ocr-base")
    return model

@stub.function(image=modal.Image.debian_slim().pip_install("deepseek-ocr", "Pillow"))
def ocr_image(image_bytes: bytes) -> str:
    # Load model (cached in container)
    model = download_model()
    # Load image from bytes
    image = Image.open(io.BytesIO(image_bytes))
    # Run inference
    result = model(image)
    return result["text"]

# For local testing: read an image and call the function
if __name__ == "__main__":
    import io
    with open("scanned_invoice.png", "rb") as f:
        image_bytes = f.read()
    # Run remotely on Modal
    output = ocr_image.remote(image_bytes)
    print("Extracted Text:", output)
    # Expected output: (text content of the image)

How it works:

  • modal.Stub defines your serverless app
  • deepseek-ocr loads the OCR model from Hugging Face
  • ocr_image is the serverless function, triggered remotely
  • Model is loaded once per container, not on every request (cold start mitigation)
  • Input is a raw image file as bytes; output is extracted text

This is all you need to process images at scale. Modal will handle provisioning, scaling, and tearing down containers automatically. For more on scaling open-source workloads, see Gentoo Linux migration to Codeberg.

Deploying and Testing the Function

Deploying to Modal

  1. Save the code above as serverless_ocr.py
  2. Install dependencies:
    pip install modal deepseek-ocr Pillow
  3. Log in to Modal:
    modal token new

    (will prompt for your Modal credentials)

  4. Run your code:
    python serverless_ocr.py

Modal spins up a container, runs your OCR function, and returns the result. Execution time is typically 2-5 seconds per image, including cold start.

Testing with PDFs or Other Formats

To handle PDFs, you can use pdf2image to convert pages to images before running OCR:

from pdf2image import convert_from_path

pages = convert_from_path("contract.pdf")
for i, page in enumerate(pages):
    image_bytes = io.BytesIO()
    page.save(image_bytes, format="PNG")
    text = ocr_image.remote(image_bytes.getvalue())
    print(f"Page {i+1} text:", text)

This lets you batch-process entire PDF archives—an approach used by professional OCR tools (source).

Performance Tuning and Costs

Serverless OCR costs and speed are determined by several factors:

  • Cold starts: First request to a new container loads the model (~2-10s), subsequent requests are much faster (~1-2s)
  • Concurrency: Modal handles parallel invocations, but parallel model downloads may hit Hugging Face rate limits—consider local model caching for heavy workloads
  • Pricing: Modal charges per second of execution and per GB of memory (see official pricing)
ProviderCost ModelTypical LatencyScaling Limits
ModalPer-second2-5s/image1000s requests/min
AWS LambdaPer-ms3-10s/image (with model load)Soft limits apply
Self-hostedInfra/VM cost1-2s/imageManual scale

For most internal tools and batch jobs, Modal’s pay-per-request model is significantly more cost-effective than SaaS APIs charging $1-5 per 1000 pages.

You landed the Cloud Storage of the future internet. Cloud Storage Services Sesame Disk by NiHao Cloud

Use it NOW and forever!

Support the growth of a Team File sharing system that works for people in China, USA, Europe, APAC and everywhere else.

For a deep-dive on performance tradeoffs in concurrent workloads, see GoLang concurrency: goroutines, channels, and the sync package.

Common Pitfalls and Pro Tips

Cold Start Delays

  • Problem: First run can be slow because the model needs to be downloaded
  • Solution: Batch your requests, or use Modal’s @stub.web_endpoint to keep containers warm

Large Model Downloads

  • Problem: Downloading large models from Hugging Face may hit rate limits or slow down parallel jobs
  • Solution: Pre-bake models into the container image using pip_install or a custom Docker image

Image Preprocessing

  • Problem: Low-quality or misaligned images reduce OCR accuracy
  • Solution: Use Pillow or OpenCV to preprocess (deskew, denoise, resize) before OCR

Cost Surprises

  • Problem: Running thousands of requests in parallel can rack up costs if not monitored
  • Solution: Set concurrency limits and monitor usage in your Modal dashboard

Security and Privacy

  • Problem: Handling sensitive docs in the cloud has privacy implications
  • Solution: Modal containers are isolated, but review your provider’s data retention and compliance policies

For more on managing complex document workflows, see diagram management at scale with Microsoft.

Conclusion and Next Steps

Deploying your own serverless OCR stack is both practical and cost-effective. In under 40 lines of Python, you can process images and PDFs with state-of-the-art models, scale to thousands of pages, and avoid SaaS lock-in. For more advanced scenarios, consider:

  • Integrating with document management workflows
  • Chaining with NLP models for entity extraction or summarization
  • Adding async/batch endpoints for large jobs

Check out the official Modal documentation and Deepseek OCR model card for further customization. If you’re interested in real-world infrastructure migrations, don’t miss Gentoo Linux Migration to Codeberg.

Start Sharing and Storing Files for Free

You can also get your own Unlimited Cloud Storage on our pay as you go product.
Other cool features include: up to 100GB size for each file.
Speed all over the world. Reliability with 3 copies of every file you upload. Snapshot for point in time recovery.
Collaborate with web office and send files to colleagues everywhere; in China & APAC, USA, Europe...
Tear prices for costs saving and more much more...
Create a Free Account Products Pricing Page