org.pdfbox.util
Class PDFTextStripperByArea
public class PDFTextStripperByArea
This will extract text from a specified region in the PDF.
void | addRegion(String regionName, Rectangle2D rect)- Add a new region to group text by.
|
void | extractRegions(PDPage page)- Process the page to extract the region text.
|
protected void | flushText()- This will print the text to the output stream.
|
List | getRegions()- Get the list of regions that have been setup.
|
String | getTextForRegion(String regionName)- Get the text for the region, this should be called after extractRegions().
|
protected void | showCharacter(TextPosition text)-
|
endDocument, endPage, endParagraph, flushText, getCharactersByArticle, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getStartBookmark, getStartPage, getText, getText, getWordSeparator, processPage, processPages, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, showCharacter, startDocument, startPage, startParagraph, writeCharacters, writeText, writeText |
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showCharacter, showString |
PDFTextStripperByArea
public PDFTextStripperByArea()
throws IOException Constructor.
addRegion
public void addRegion(String regionName,
Rectangle2D rect) Add a new region to group text by.
regionName - The name of the region.rect - The rectangle area to retrieve the text from.
extractRegions
public void extractRegions(PDPage page)
throws IOException Process the page to extract the region text.
page - The page to extract the regions from.
flushText
protected void flushText()
throws IOException This will print the text to the output stream.
- flushText in interface PDFTextStripper
getRegions
public List getRegions()
Get the list of regions that have been setup.
- A list of java.lang.String objects to identify the region names.
getTextForRegion
public String getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
regionName - The name of the region to get the text from.
- The text that was identified in that region.