FormaTeX

\begin{article}

LaTeX PDF Generation in Python: From subprocess to REST API

The subprocess approach to LaTeX in Python is painful — version conflicts, temp file management, missing packages. Here is the clean REST API alternative with full code examples.

·5 min read·
LaTeX PDF Generation in Python: From subprocess to REST API

Python is one of the most common languages for generating documents programmatically — reports, invoices, certificates, scientific output. LaTeX produces the highest-quality PDFs, but the traditional Python approach of calling subprocess.run(["pdflatex", ...]) is fragile and hard to deploy. This post shows both approaches so you can see the difference.

The subprocess Approach (And Its Pain)

The naive Python approach calls the system pdflatex binary:

python
import subprocess
import tempfile
import os

def compile_latex_subprocess(latex_source: str) -> bytes:
    with tempfile.TemporaryDirectory() as tmpdir:
        tex_path = os.path.join(tmpdir, "document.tex")
        pdf_path = os.path.join(tmpdir, "document.pdf")

        with open(tex_path, "w", encoding="utf-8") as f:
            f.write(latex_source)

        result = subprocess.run(
            ["pdflatex", "-interaction=nonstopmode", "-output-directory", tmpdir, tex_path],
            capture_output=True,
            text=True,
            timeout=60,
        )

        if result.returncode != 0:
            # Parse the log for the actual error
            raise RuntimeError(f"pdflatex failed:\n{result.stdout[-2000:]}")

        if not os.path.exists(pdf_path):
            raise RuntimeError("pdflatex produced no output")

        with open(pdf_path, "rb") as f:
            return f.read()

This works locally if pdflatex is installed. It breaks in production because:

  • pdflatex must be installed on the deployment server
  • The installed TeX Live version must include all packages your templates use
  • Temp file cleanup can fail and fill disk
  • The subprocess approach does not handle multi-pass compilation (bibliography, cross-references)
  • Timeouts are tricky to enforce at the subprocess level
  • Docker images balloon to 4 GB

The REST API Approach

Replace the subprocess with an HTTP call:

python
import os
import requests

def compile_latex(latex_source: str, engine: str = "pdflatex") -> bytes:
    response = requests.post(
        "https://api.formatex.io/api/v1/compile",
        headers={"X-API-Key": os.environ["FORMATEX_KEY"]},
        json={"content": latex_source, "engine": engine},
    )

    if not response.ok:
        error = response.json()
        raise RuntimeError(error.get("log") or error.get("error") or "Unknown error")

    return response.content

That is the entire integration. No subprocess management, no temp files, no system dependencies.

python
# Save the result
pdf_bytes = compile_latex(r"""
\documentclass{article}
\begin{document}
Hello from Python!
\end{document}
""")

with open("output.pdf", "wb") as f:
    f.write(pdf_bytes)

Error Handling

LaTeX errors return HTTP 400 with a JSON body containing the TeX log. Parse it to surface useful errors:

python
import os
import requests

class LatexCompilationError(Exception):
    def __init__(self, message: str, log: str):
        super().__init__(message)
        self.log = log

    def first_error(self) -> str:
        """Extract the first error line from the TeX log."""
        for line in self.log.splitlines():
            if line.startswith("!"):
                return line
        return self.log[:200]


def compile_latex(source: str, engine: str = "pdflatex") -> bytes:
    response = requests.post(
        "https://api.formatex.io/api/v1/compile",
        headers={
            "X-API-Key": os.environ["FORMATEX_KEY"],
            "Content-Type": "application/json",
        },
        json={"content": source, "engine": engine},
        timeout=130,  # slightly above the Pro plan's 120s timeout
    )

    if response.status_code == 400:
        body = response.json()
        raise LatexCompilationError(
            message="LaTeX compilation failed",
            log=body.get("log", body.get("error", "")),
        )

    response.raise_for_status()
    return response.content


# Usage
try:
    pdf = compile_latex(my_template)
except LatexCompilationError as e:
    print(f"LaTeX error: {e.first_error()}")
    print(f"Full log:\n{e.log}")

Set your requests timeout slightly above the API's plan timeout. This prevents the HTTP connection from hanging indefinitely if the API is slow to respond, while still allowing the full compilation window to complete.

Async with httpx

For async Python applications (FastAPI, async Django, async Flask), use httpx:

python
import os
import httpx

async def compile_latex_async(source: str, engine: str = "pdflatex") -> bytes:
    async with httpx.AsyncClient(timeout=130) as client:
        response = await client.post(
            "https://api.formatex.io/api/v1/compile",
            headers={"X-API-Key": os.environ["FORMATEX_KEY"]},
            json={"content": source, "engine": engine},
        )

        if response.status_code == 400:
            body = response.json()
            raise RuntimeError(body.get("log") or body.get("error"))

        response.raise_for_status()
        return response.content


# FastAPI endpoint
from fastapi import FastAPI
from fastapi.responses import Response

app = FastAPI()

@app.post("/generate-pdf")
async def generate_pdf(latex_source: str):
    pdf_bytes = await compile_latex_async(latex_source)
    return Response(
        content=pdf_bytes,
        media_type="application/pdf",
        headers={"Content-Disposition": "attachment; filename=document.pdf"},
    )

Storing PDFs

Once you have the PDF bytes, storing them is straightforward:

python
import boto3
import io

def store_pdf_s3(pdf_bytes: bytes, key: str) -> str:
    """Upload PDF to S3 and return the object URL."""
    s3 = boto3.client("s3")
    bucket = os.environ["PDF_BUCKET"]

    s3.upload_fileobj(
        io.BytesIO(pdf_bytes),
        bucket,
        key,
        ExtraArgs={"ContentType": "application/pdf"},
    )

    return f"https://{bucket}.s3.amazonaws.com/{key}"


# Complete flow: generate → store → return URL
async def generate_and_store(invoice_data: dict) -> str:
    latex = build_invoice_latex(invoice_data)
    pdf_bytes = await compile_latex_async(latex)
    url = store_pdf_s3(pdf_bytes, f"invoices/{invoice_data['id']}.pdf")
    return url

FormaTeX does not store your PDFs — every compilation is ephemeral. The PDF is streamed directly in the HTTP response body and deleted from the worker immediately. You are responsible for storing the bytes wherever you need them.

Choosing the Engine from Python

python
def compile_document(
    source: str,
    *,
    has_bibliography: bool = False,
    needs_custom_fonts: bool = False,
) -> bytes:
    if has_bibliography:
        engine = "latexmk"
    elif needs_custom_fonts:
        engine = "xelatex"
    else:
        engine = "pdflatex"

    return compile_latex(source, engine=engine)

Beyond Sync Compilation

The examples above use the synchronous POST /compile endpoint. FormaTeX also offers:

  • Smart Compile (POST /compile/smart) — AI-powered error detection and auto-fix. If your LaTeX has errors, the AI pipeline fixes them automatically. See Smart Compile guide.
  • Async Compilation (POST /compile/async) — submit a job, get a job ID, poll for completion or receive a webhook. Ideal for long documents and batch processing. See Async guide.

Get Started

\end{article}

Back to blog

\related{posts}

One quick thing

We track anonymous usage — page views, feature usage, compilation events — to understand what works and what doesn't. No ads, no personal data, no third-party sharing.

Cookie policy