| Feature | PyMuPDF | pikepdf | PyPDF2 | pdfrw | pdfplumber / pdfminer |
|---|---|---|---|---|---|
| Supports Multiple Document Formats |
PDF
XPS
EPUB
MOBI
FB2
CBZ
SVG
TXT
Image
DOCX XLSX PPTX HWPX See note |
||||
| Implementation | Python and C | Python and C++ | Python | Python | Python |
| Render Document Pages | All document types | No rendering | No rendering | No rendering | No rendering |
| Write Text to PDF Page |
See: Page.insert_htmlbox or: Page.insert_textbox or: TextWriter |
||||
| Supports CJK characters | |||||
| Extract Text | All document types | PDF only | PDF only | ||
| Extract Text as Markdown (.md) | All document types | ||||
| Extract Tables | All document types | PDF only | |||
| Extract Vector Graphics | All document types | Limited | |||
| Draw Vector Graphics (PDF) | |||||
| Based on Existing, Mature Library | MuPDF | QPDF | |||
| Automatic Repair of Damaged PDFs | |||||
| Encrypted PDFs | Limited | Limited | |||
| Linerarized PDFs | |||||
| Incremental Updates | |||||
| Integrates with Jupyter and IPython Notebooks | |||||
| Joining / Merging PDF with other Document Types | All document types | PDF only | PDF only | PDF only | PDF only |
| OCR API for Seamless Integration with Tesseract | All document types | ||||
| Integrated Checkpoint / Restart Feature (PDF) | |||||
| PDF Optional Content | |||||
| PDF Embedded Files | Limited | Limited | |||
| PDF Redactions | |||||
| PDF Annotations | Full | Limited | |||
| PDF Form Fields | Create, read, update | Limited, no creation | |||
| PDF Page Labels | Read-only | ||||
| Support Font Sub-Setting |