|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFTextStripperByArea
public class PDFTextStripperByArea
This will extract text from a specified region in the PDF.
| Field Summary |
|---|
| Fields inherited from class org.pdfbox.util.PDFTextStripper |
|---|
charactersByArticle, output |
| Constructor Summary | |
|---|---|
PDFTextStripperByArea()
Constructor. |
|
| Method Summary | |
|---|---|
void |
addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
Add a new region to group text by. |
void |
extractRegions(PDPage page)
Process the page to extract the region text. |
protected void |
flushText()
This will print the text to the output stream. |
java.util.List |
getRegions()
Get the list of regions that have been setup. |
java.lang.String |
getTextForRegion(java.lang.String regionName)
Get the text for the region, this should be called after extractRegions(). |
protected void |
showCharacter(TextPosition text)
This will show add a character to the list of characters to be printed to the text file. |
| Methods inherited from class org.pdfbox.util.PDFStreamEngine |
|---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public PDFTextStripperByArea()
throws java.io.IOException
java.io.IOException - If there is an error loading properties.| Method Detail |
|---|
public void addRegion(java.lang.String regionName,
java.awt.geom.Rectangle2D rect)
regionName - The name of the region.rect - The rectangle area to retrieve the text from.public java.util.List getRegions()
public java.lang.String getTextForRegion(java.lang.String regionName)
regionName - The name of the region to get the text from.
public void extractRegions(PDPage page)
throws java.io.IOException
page - The page to extract the regions from.
java.io.IOException - If there is an error while extracting text.protected void showCharacter(TextPosition text)
showCharacter in class PDFTextStrippertext - The description of the character to display.
protected void flushText()
throws java.io.IOException
flushText in class PDFTextStripperjava.io.IOException - If there is an error writing the text.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||