\documentclass{article}
Production-ready LaTeX templates for journal articles and conference papers. Covers standard article, IEEEtran, and REVTeX4-2 — with figures, tables, equations, and bibliography pre-configured.
\section{Choosing a Document Class}
The document class determines your paper's layout, citation format, and submission requirements. Match it to your target journal or conference.
| Feature | article | IEEEtran | REVTeX4-2 |
|---|---|---|---|
| Document class | \documentclass{article} | \documentclass[journal]{IEEEtran} | \documentclass[aps,prl]{revtex4-2} |
| Column layout | Single column | Two column | Two column |
| Font size | 10–12pt | 10pt (fixed) | 10pt (fixed) |
| Abstract env | \begin{abstract} | \begin{abstract} | \begin{abstract} |
| Citation style | Any (biblatex) | \cite → [1] | \cite → [1] or author–year |
| Best for | Preprints, arXiv, general journals | IEEE Transactions, ICASSP, CVPR | Physical Review, Applied Physics |
\section{Paper Structure}
Every section of a research paper has its own LaTeX idioms. Here's what matters in each.
Wrap in \begin{abstract}…\end{abstract}. Keep to 150–250 words. Avoid citations and equations — most journals process abstracts separately for their databases.
Use \section{Introduction}. State the problem, summarise prior work with \cite, and enumerate your contributions with enumerate. Keep subsections shallow in short papers.
Present equations using equation and align environments from amsmath. Use \label and \cref for cross-references. Define notation in a paragraph before equations.
Use booktabs for publication-quality rules: \toprule, \midrule, \bottomrule. Never use vertical lines. Place tables with [H] (float package) or [t] for top-of-page.
Include with \includegraphics and caption. Use subcaption for panels (a), (b), (c). Export vector graphics as PDF for lossless scaling. Name files descriptively.
Use biblatex with \printbibliography or the journal's required BST file with \bibliography. Export references from Zotero or Google Scholar as BibTeX.
\section{Figures and Tables}
Copy these patterns for single figures, subfigure panels, and booktabs tables.
% Single figure
\begin{figure}[t]
\centering
\includegraphics[width=0.9\linewidth]{figures/architecture.pdf}
\caption{Overview of the proposed model architecture. The encoder
processes input tokens through hierarchical attention blocks
(\cref{sec:method}).}
\label{fig:architecture}
\end{figure}
% Two-panel subfigures
\begin{figure}[t]
\centering
\begin{subfigure}[b]{0.48\linewidth}
\includegraphics[width=\linewidth]{figures/train_loss.pdf}
\caption{Training loss curve.}
\end{subfigure}
\hfill
\begin{subfigure}[b]{0.48\linewidth}
\includegraphics[width=\linewidth]{figures/eval_rouge.pdf}
\caption{ROUGE-1 on validation set.}
\end{subfigure}
\caption{Training dynamics over 50k steps.}
\label{fig:training}
\end{figure}\begin{table}[H]
\centering
\caption{Comparison of methods on the CNN/DailyMail test set.}
\label{tab:main-results}
\begin{tabular}{l S[table-format=2.1] S[table-format=2.1] S[table-format=2.1]}
\toprule
{Method} & {R-1} & {R-2} & {R-L} \\
\midrule
Lead-3 & 40.4 & 17.7 & 36.7 \\
Longformer & 44.2 & 21.3 & 41.0 \\
Full Attention & 45.1 & 21.9 & 41.8 \\
\midrule
\textbf{Ours} & \textbf{47.4} & \textbf{23.1} & \textbf{43.5} \\
\bottomrule
\end{tabular}
\end{table}\include{preamble}
A complete, compiling article with math, table, and bibliography. Works out of the box with FormaTeX using the pdfLaTeX + Biber pipeline.
\documentclass[12pt]{article}
% Encoding & fonts
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
% Page layout
\usepackage[margin=2.5cm]{geometry}
% Mathematics
\usepackage{amsmath, amssymb, amsthm}
\usepackage{mathtools}
% Figures & tables
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{array}
\usepackage[labelfont=bf, font=small]{caption}
\usepackage{subcaption}
\usepackage{float}
% References & citations
\usepackage[style=numeric-comp, sorting=none, backend=biber]{biblatex}
\addbibresource{references.bib}
% Hyperlinks
\usepackage[hidelinks, colorlinks=false]{hyperref}
\usepackage{cleveref}
% Algorithms
\usepackage[ruled, vlined]{algorithm2e}
% Utilities
\usepackage{microtype}
\usepackage{xcolor}
\usepackage{lipsum} % remove in production
% -------------------------------------------------------
% Document metadata
% -------------------------------------------------------
\title{%
A Sub-Quadratic Attention Mechanism for\\
Long-Document Summarisation
}
\author{%
Jane Researcher$^{1}$\thanks{Corresponding author: [email protected]} \and
John Co-Author$^{2}$
}
\date{%
$^{1}$Department of Computer Science, University of Example\\
$^{2}$AI Research Lab, Tech Institute\\[6pt]
\today
}
\begin{document}
\maketitle
\begin{abstract}
We present an efficient attention mechanism that reduces the computational
complexity of transformer self-attention from $\mathcal{O}(n^2)$ to
$\mathcal{O}(n \log n)$, enabling processing of documents with up to
64{,}000 tokens on a single GPU. Experiments on the CNN/DailyMail and
arXiv summarisation benchmarks show a ROUGE-1 improvement of 2.3 points
over the full-attention baseline with 40\% lower memory consumption.
\end{abstract}
\textbf{Keywords:} natural language processing, attention mechanism,
transformers, document summarisation
\section{Introduction}
Long-document understanding remains a core challenge in NLP.
The quadratic memory complexity of standard self-attention
($\mathcal{O}(n^2 d)$ for sequence length $n$ and dimension $d$)
limits practical sequence lengths to 2{,}048–4{,}096 tokens on
commodity hardware~\cite{vaswani2017}.
\subsection{Contributions}
\begin{enumerate}
\item A hierarchical attention scheme operating at sentence and paragraph levels.
\item An open-source implementation evaluated on three public benchmarks.
\item A theoretical analysis proving the $\mathcal{O}(n \log n)$ bound.
\end{enumerate}
\section{Related Work}
Sparse attention patterns were introduced by~\cite{child2019} and extended
by Longformer~\cite{beltagy2020} with sliding-window plus global tokens.
Linear attention approximations~\cite{katharopoulos2020} achieve $\mathcal{O}(n)$
complexity but sacrifice expressiveness on local patterns.
\section{Methodology}
\subsection{Hierarchical Attention}
Let $\mathbf{X} \in \mathbb{R}^{n \times d}$ be the token embeddings.
We partition $\mathbf{X}$ into $k$ sentence blocks and compute:
\begin{equation}
\mathbf{H}_i = \text{Attention}\bigl(\mathbf{Q}_i,\, \mathbf{K}_i,\, \mathbf{V}_i\bigr),
\quad i = 1, \ldots, k
\label{eq:local-attn}
\end{equation}
where each block attends only within its sentence boundary for local features,
followed by a cross-block pooling step for global context.
\subsection{Complexity Analysis}
For $k$ blocks each of size $m = n/k$:
\begin{equation}
\mathcal{C} = k \cdot \mathcal{O}(m^2) + \mathcal{O}(k^2)
= \mathcal{O}\!\left(\frac{n^2}{k}\right) + \mathcal{O}(k^2)
\end{equation}
Setting $k = n^{2/3}$ minimises total complexity to $\mathcal{O}(n^{4/3})$,
and with $k = \sqrt{n}$ we achieve $\mathcal{O}(n \log n)$ in expectation.
\section{Experiments}
\subsection{Datasets and Metrics}
We evaluate on CNN/DailyMail~\cite{see2017} and arXiv~\cite{cohan2018}.
Summaries are scored with ROUGE-1, ROUGE-2, and ROUGE-L.
\begin{table}[H]
\centering
\caption{ROUGE scores on CNN/DailyMail test set.}
\label{tab:results}
\begin{tabular}{lccc}
\toprule
Model & R-1 & R-2 & R-L \\
\midrule
Lead-3 baseline & 40.4 & 17.7 & 36.7 \\
Longformer & 44.2 & 21.3 & 41.0 \\
Full Attention & 45.1 & 21.9 & 41.8 \\
\textbf{Ours} & \textbf{47.4} & \textbf{23.1} & \textbf{43.5} \\
\bottomrule
\end{tabular}
\end{table}
\section{Conclusion}
We demonstrated that hierarchical attention matches or exceeds full-attention
performance at substantially lower computational cost. Future work will
explore integration with retrieval-augmented generation pipelines.
\printbibliography
\end{document}The full research paper — with abstract, math, table, and bibliography — is pre-loaded and compiles in seconds. No installation, no setup.
\section{FAQ}
Should I use article, IEEEtran, or REVTeX for my paper?
Use article for arXiv preprints and journals without a required class. Use IEEEtran for any IEEE venue (Transactions, Letters, or conferences like CVPR). Use revtex4-2 for American Physical Society journals (Physical Review, PRL). Always check the journal's author guidelines first.
How do I format equations in a two-column layout?
In two-column documents (IEEEtran, revtex), wide equations can break columns. Use the figure* or table* environments with an asterisk for full-width content, or the strip environment in revtex. For inline equations, microtype handles spacing automatically.
What is the best way to manage references for a paper?
Use biblatex with numeric-comp style for a clean [1, 2, 3] citation format. Export your library from Zotero or Mendeley as .bib. For IEEE submissions, use IEEEtran.bst with the classic BibTeX workflow, since the journal's LaTeX template usually ships with it.
How do I make publication-quality tables in LaTeX?
Use the booktabs package and replace all \hline calls with \toprule, \midrule, and \bottomrule. Remove all vertical lines. Add column alignment via the siunitx S column type to align numbers at the decimal point.
Which engine should I use — pdfLaTeX, XeLaTeX, or LuaLaTeX?
pdfLaTeX is fastest and required by most journals. XeLaTeX is best when you need custom OpenType fonts. LuaLaTeX offers the most flexibility (including Lua scripting) but compiles slowest. FormaTeX supports all three — see /engines for a full comparison.
How do I submit to arXiv?
arXiv requires a flat source upload: your .tex file, all .bib/.bbl files, and all figure files in the same directory. Run pdflatex → biber → pdflatex × 2 locally first to generate the .bbl, then upload the .tex + .bbl + figures. FormaTeX produces compliant PDFs automatically.
Open the template in FormaTeX, write your paper, and compile to PDF instantly — no TeX Live installation, no local setup required.
One quick thing
We track anonymous usage — page views, feature usage, compilation events — to understand what works and what doesn't. No ads, no personal data, no third-party sharing.