org.pdfbox.util
Class PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs.
Paragraphs broken by pages, columns, or figures are not mended.
- jjb - http://www.johnjbarton.com
endDocument, endPage, endParagraph, flushText, getCharactersByArticle, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getStartBookmark, getStartPage, getText, getText, getWordSeparator, processPage, processPages, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, showCharacter, startDocument, startPage, startParagraph, writeCharacters, writeText, writeText |
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showCharacter, showString |
PDFText2HTML
public PDFText2HTML()
throws IOException Constructor.
endParagraph
protected void endParagraph()
throws IOException Write out the paragraph separator.
- endParagraph in interface PDFTextStripper
getTitleGuess
protected String getTitleGuess()
The guess to the document title.
- A string that is the title of this document.
guessTitle
protected TextPosition guessTitle(Iterator textIter)
This method will attempt to guess the title of the document.
textIter - The characters on the first page.
- The text position that is guessed to be the title.
isSuppressParagraphs
public boolean isSuppressParagraphs()
- Returns the suppressParagraphs.
setSuppressParagraphs
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs - The suppressParagraphs to set.
writeHeader
protected void writeHeader()
throws IOException Write the header to the output document.