|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.pdfbox.util.PDFStreamEngine
org.pdfbox.util.PDFTextStripper
org.pdfbox.util.PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs. Paragraphs broken by pages, columns, or figures are not mended.
Field Summary |
---|
Fields inherited from class org.pdfbox.util.PDFTextStripper |
---|
charactersByArticle, output |
Constructor Summary | |
---|---|
PDFText2HTML()
Constructor. |
Method Summary | |
---|---|
void |
endDocument(PDDocument pdf)
This method is available for subclasses of this class. |
protected void |
endParagraph()
Write out the paragraph separator. |
protected void |
flushText()
This will print the text to the output stream. |
protected java.lang.String |
getTitleGuess()
The guess to the document title. |
protected TextPosition |
guessTitle(java.util.Iterator textIter)
This method will attempt to guess the title of the document. |
boolean |
isSuppressParagraphs()
|
void |
setSuppressParagraphs(boolean shouldSuppressParagraphs)
|
protected void |
startParagraph()
Write out the paragraph separator. |
protected void |
writeCharacters(TextPosition position)
Write the string to the output stream. |
protected void |
writeHeader()
Write the header to the output document. |
Methods inherited from class org.pdfbox.util.PDFStreamEngine |
---|
getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PDFText2HTML() throws java.io.IOException
java.io.IOException
- If there is an error during initialization.Method Detail |
---|
protected void writeHeader() throws java.io.IOException
java.io.IOException
- If there is a problem writing out the header to the document.protected java.lang.String getTitleGuess()
protected void flushText() throws java.io.IOException
flushText
in class PDFTextStripper
java.io.IOException
- If there is an error writing the text.public void endDocument(PDDocument pdf) throws java.io.IOException
endDocument
in class PDFTextStripper
pdf
- The PDF document that is being processed.
java.io.IOException
- If an IO error occurs.protected TextPosition guessTitle(java.util.Iterator textIter)
textIter
- The characters on the first page.
protected void startParagraph() throws java.io.IOException
startParagraph
in class PDFTextStripper
java.io.IOException
- If there is an error writing to the stream.protected void endParagraph() throws java.io.IOException
endParagraph
in class PDFTextStripper
java.io.IOException
- If there is an error writing to the stream.protected void writeCharacters(TextPosition position) throws java.io.IOException
writeCharacters
in class PDFTextStripper
position
- The text to write to the stream.
java.io.IOException
- If there is an error when writing the text.public boolean isSuppressParagraphs()
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs
- The suppressParagraphs to set.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |