|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
| Field Summary |
|---|
| Fields inherited from class org.pdfbox.util.PDFTextStripper |
|---|
charactersByArticle, output |
| Constructor Summary | |
|---|---|
PDFText2HTML()
Constructor. |
|
| Method Summary | |
|---|---|
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. |
protected void |
endParagraph()
Write out the paragraph separator. |
protected void |
flushText()
This will print the text to the output stream. |
protected java.lang.String |
getTitleGuess()
The guess to the document title. |
protected TextPosition |
guessTitle(java.util.Iterator textIter)
This method will attempt to guess the title of the document. |
boolean |
isSuppressParagraphs()
|
void |
setSuppressParagraphs(boolean shouldSuppressParagraphs)
|
protected void |
startParagraph()
Write out the paragraph separator. |
protected void |
writeCharacters(TextPosition position)
Write the string to the output stream. |
protected void |
writeHeader()
Write the header to the output document. |
| Methods inherited from class org.pdfbox.util.PDFStreamEngine |
|---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public PDFText2HTML()
throws java.io.IOException
java.io.IOException - If there is an error during initialization.| Method Detail |
|---|
protected void writeHeader()
throws java.io.IOException
java.io.IOException - If there is a problem writing out the header to the document.protected java.lang.String getTitleGuess()
protected void flushText()
throws java.io.IOException
flushText in class PDFTextStripperjava.io.IOException - If there is an error writing the text.
public void endDocument(PDDocument pdf)
throws java.io.IOException
endDocument in class PDFTextStripperpdf - The PDF document that is being processed.
java.io.IOException - If an IO error occurs.protected TextPosition guessTitle(java.util.Iterator textIter)
textIter - The characters on the first page.
protected void startParagraph()
throws java.io.IOException
startParagraph in class PDFTextStripperjava.io.IOException - If there is an error writing to the stream.
protected void endParagraph()
throws java.io.IOException
endParagraph in class PDFTextStripperjava.io.IOException - If there is an error writing to the stream.
protected void writeCharacters(TextPosition position)
throws java.io.IOException
writeCharacters in class PDFTextStripperposition - The text to write to the stream.
java.io.IOException - If there is an error when writing the text.public boolean isSuppressParagraphs()
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs - The suppressParagraphs to set.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||