org.apache.pdfbox.pdfparser
public class NonSequentialPDFParser extends PDFParser
PDFParser.
This class can be used as a PDFParser replacement. First parse()
must be called before page objects can be retrieved, e.g. getPDDocument().
This class is a much enhanced version of QuickParser presented in
PDFBOX-1104
by Jeremy Villalobos.| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
SYSPROP_EOFLOOKUPRANGE |
static java.lang.String |
SYSPROP_PARSEMINIMAL |
xrefTrailerResolverDEF, document, ENDOBJ, ENDSTREAM, FORCE_PARSING, forceParsing, pdfSource| Constructor and Description |
|---|
NonSequentialPDFParser(java.io.File file,
RandomAccess raBuf)
Constructs parser for given file using given buffer for temporary storage.
|
NonSequentialPDFParser(java.io.File file,
RandomAccess raBuf,
java.lang.String decryptionPassword)
Constructs parser for given file using given buffer for temporary storage.
|
NonSequentialPDFParser(java.lang.String filename)
Constructs parser for given file using memory buffer.
|
| Modifier and Type | Method and Description |
|---|---|
PDPage |
getPage(int pageNr)
Returns the page requested with all the objects loaded into it.
|
int |
getPageNumber()
Returns the number of pages in a document.
|
PDDocument |
getPDDocument()
This will get the PD document that was parsed.
|
SecurityHandler |
getSecurityHandler()
Returns security handler of the document or
null if document
is not encrypted or parse() wasn't called before. |
void |
parse()
This will parse the stream and populate the COSDocument object.
|
protected COSStream |
parseCOSStream(COSDictionary dic,
RandomAccess file)
This will read a COSStream from the input stream using length attribute
within dictionary.
|
void |
setEOFLookupRange(int byteCount)
Sets how many trailing bytes of PDF file are searched for
EOF marker and 'startxref' marker.
|
getDocument, getFDFDocument, isContinueOnError, parseStartXref, parseTrailer, parseXrefStream, parseXrefTable, setTempDirectoryisClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpacespublic static final java.lang.String SYSPROP_PARSEMINIMAL
public static final java.lang.String SYSPROP_EOFLOOKUPRANGE
public NonSequentialPDFParser(java.lang.String filename)
throws java.io.IOException
filename - the filename of the pdf to be parsedjava.io.IOException - If something went wrong.public NonSequentialPDFParser(java.io.File file,
RandomAccess raBuf)
throws java.io.IOException
file - the pdf to be parsedraBuf - the buffer to be used for parsingjava.io.IOException - If something went wrong.public NonSequentialPDFParser(java.io.File file,
RandomAccess raBuf,
java.lang.String decryptionPassword)
throws java.io.IOException
file - the pdf to be parsedraBuf - the buffer to be used for parsingdecryptionPassword - password to be used for decryptionjava.io.IOException - If something went wrong.public void setEOFLookupRange(int byteCount)
DEFAULT_TRAIL_BYTECOUNT.
In case system property SYSPROP_EOFLOOKUPRANGE is defined
this value will be set on initialization but can be overwritten later.
byteCount - number of trailing bytespublic void parse()
throws java.io.IOException
public SecurityHandler getSecurityHandler()
null if document
is not encrypted or parse() wasn't called before.public PDDocument getPDDocument() throws java.io.IOException
getPDDocument in class PDFParserjava.io.IOException - If there is an error getting the document.public int getPageNumber()
throws java.io.IOException
java.io.IOException - if PAGES or other needed object is missingpublic PDPage getPage(int pageNr) throws java.io.IOException
pageNr - starts from 0 to the number of pages.java.io.IOException - If something went wrong.protected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws java.io.IOException
parseCOSStream in class BaseParserdic - dictionary that goes with this stream.file - file to write the stream to when reading.java.io.IOException - if an error occurred reading the stream, like problems
with reading length attribute, stream does not end with 'endstream'
after data read, stream too short etc.