public class TagParseState extends SingleCharacterReceiver
'<' <token> <attrs> '>' ... '</' <token> '>' '<' <token> <attrs> '/>' '<?' <token> <attrs> '?>' '<![' [<token>] '[' ... ']]>' '<!' <token> ... '>' '<!--' ... '-->'Each of these, save the comment, has supporting protected methods that will be called by the parsing engine. Overriding these methods will allow an extending class to perform higher-level data extraction and parsing. Of these, the messiest is the <! ... > construct, since there can be multiple nested btags, cdata-like escapes, and qtags inside. Ideally the parser should produce a sequence of preparsed tokens from these tags. Since they can be nested, keeping track of the depth is also essential, so we do that with a btag depth counter. Thus, in this case, it is not the state that matters, but the btag depth, to determine if the parser is operating inside a btag.
| Modifier and Type | Field and Description |
|---|---|
protected java.lang.StringBuilder |
accumBuffer
This is the only buffer we actually accumulate stuff in.
|
protected java.lang.StringBuilder |
ampBuffer
Buffer of characters seen after ampersand.
|
protected int |
bTagDepth
The btag depth, which indicates btag behavior when > 0.
|
protected java.util.List<AttrNameValue> |
currentAttrList |
protected java.lang.String |
currentAttrName |
protected java.lang.StringBuilder |
currentAttrNameBuffer |
protected int |
currentState |
protected java.lang.String |
currentTagName |
protected java.lang.StringBuilder |
currentTagNameBuffer |
protected java.lang.StringBuilder |
currentValueBuffer |
protected boolean |
inAmpersand
Whether we've seen an ampersand
|
protected static java.util.Map<java.lang.String,java.lang.String> |
mapLookup |
protected static int |
TAGPARSESTATE_IN_ATTR_LOOKING_FOR_VALUE |
protected static int |
TAGPARSESTATE_IN_ATTR_NAME |
protected static int |
TAGPARSESTATE_IN_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_BANG_TOKEN |
protected static int |
TAGPARSESTATE_IN_BRACKET_TOKEN |
protected static int |
TAGPARSESTATE_IN_CDATA_BODY |
protected static int |
TAGPARSESTATE_IN_COMMENT |
protected static int |
TAGPARSESTATE_IN_DOUBLE_QUOTES_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_END_TAG_NAME |
protected static int |
TAGPARSESTATE_IN_QTAG_ATTR_LOOKING_FOR_VALUE |
protected static int |
TAGPARSESTATE_IN_QTAG_ATTR_NAME |
protected static int |
TAGPARSESTATE_IN_QTAG_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_QTAG_DOUBLE_QUOTES_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_QTAG_NAME |
protected static int |
TAGPARSESTATE_IN_QTAG_SAW_QUESTION |
protected static int |
TAGPARSESTATE_IN_QTAG_SINGLE_QUOTES_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_QTAG_UNQUOTED_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_SINGLE_QUOTES_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_TAG_NAME |
protected static int |
TAGPARSESTATE_IN_TAG_SAW_SLASH |
protected static int |
TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE |
protected static int |
TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE_SAW_SLASH |
protected static int |
TAGPARSESTATE_NEED_FINAL_BRACKET |
protected static int |
TAGPARSESTATE_NORMAL |
protected static int |
TAGPARSESTATE_SAWCOMMENTDASH |
protected static int |
TAGPARSESTATE_SAWDASH |
protected static int |
TAGPARSESTATE_SAWEXCLAMATION |
protected static int |
TAGPARSESTATE_SAWLEFTANGLE |
protected static int |
TAGPARSESTATE_SAWRIGHTBRACKET |
protected static int |
TAGPARSESTATE_SAWSECONDCOMMENTDASH |
protected static int |
TAGPARSESTATE_SAWSECONDRIGHTBRACKET |
charBuffer| Constructor and Description |
|---|
TagParseState() |
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
acceptNewTag()
Allow parsing within tag.
|
protected static java.lang.String |
attributeDecode(java.lang.String input)
Decode an html attribute
|
boolean |
dealWithCharacter(char thisChar)
Deal with a character.
|
protected boolean |
dumpValues(java.lang.String value) |
protected static boolean |
isPunctuation(char x)
Is a character markup language punctuation?
|
protected static boolean |
isWhitespace(char x)
Is a character markup language whitespace?
|
protected static java.lang.String |
mapChunk(java.lang.String input)
Map an entity reference back to a character
|
protected java.lang.StringBuilder |
newBuffer()
Allocate the buffer.
|
protected boolean |
noteBTag(java.lang.String tagName)
This method is called for every <! <token> ...
|
protected boolean |
noteBTagToken(java.lang.String token)
This method gets called for every token inside a btag.
|
protected boolean |
noteEndBTag()
This method is called for the end of every btag, or any time
there's a naked '>' in the document.
|
protected boolean |
noteEndEscaped()
Called for the end of every cdata-like tag.
|
protected boolean |
noteEndTag(java.lang.String tagName)
This method gets called for every end tag.
|
protected boolean |
noteEscaped(java.lang.String token)
Called for the start of every cdata-like tag, e.g.
|
protected boolean |
noteEscapedCharacter(char thisChar)
This method gets called for every character that is found within an
escape block, e.g.
|
protected boolean |
noteNormalCharacter(char thisChar)
This method gets called for every character that is not part of a tag etc.
|
protected boolean |
noteQTag(java.lang.String tagName,
java.util.List<AttrNameValue> attributes)
This method is called for every <? ...
|
protected boolean |
noteTag(java.lang.String tagName,
java.util.List<AttrNameValue> attributes)
This method gets called for every tag.
|
protected boolean |
outputAmpBuffer()
Interpret ampersand buffer.
|
dealWithCharacters, dealWithRemainderfinishUpprotected static final int TAGPARSESTATE_NORMAL
protected static final int TAGPARSESTATE_SAWLEFTANGLE
protected static final int TAGPARSESTATE_SAWEXCLAMATION
protected static final int TAGPARSESTATE_SAWDASH
protected static final int TAGPARSESTATE_IN_COMMENT
protected static final int TAGPARSESTATE_SAWCOMMENTDASH
protected static final int TAGPARSESTATE_SAWSECONDCOMMENTDASH
protected static final int TAGPARSESTATE_IN_TAG_NAME
protected static final int TAGPARSESTATE_IN_ATTR_NAME
protected static final int TAGPARSESTATE_IN_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_TAG_SAW_SLASH
protected static final int TAGPARSESTATE_IN_END_TAG_NAME
protected static final int TAGPARSESTATE_IN_ATTR_LOOKING_FOR_VALUE
protected static final int TAGPARSESTATE_IN_SINGLE_QUOTES_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_DOUBLE_QUOTES_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_QTAG_NAME
protected static final int TAGPARSESTATE_IN_QTAG_ATTR_NAME
protected static final int TAGPARSESTATE_IN_QTAG_SAW_QUESTION
protected static final int TAGPARSESTATE_IN_QTAG_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_QTAG_ATTR_LOOKING_FOR_VALUE
protected static final int TAGPARSESTATE_IN_QTAG_SINGLE_QUOTES_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_QTAG_DOUBLE_QUOTES_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_QTAG_UNQUOTED_ATTR_VALUE
protected static final int TAGPARSESTATE_IN_BRACKET_TOKEN
protected static final int TAGPARSESTATE_NEED_FINAL_BRACKET
protected static final int TAGPARSESTATE_IN_BANG_TOKEN
protected static final int TAGPARSESTATE_IN_CDATA_BODY
protected static final int TAGPARSESTATE_SAWRIGHTBRACKET
protected static final int TAGPARSESTATE_SAWSECONDRIGHTBRACKET
protected static final int TAGPARSESTATE_IN_UNQUOTED_ATTR_VALUE_SAW_SLASH
protected int currentState
protected int bTagDepth
protected java.lang.StringBuilder accumBuffer
protected java.lang.StringBuilder currentTagNameBuffer
protected java.lang.StringBuilder currentAttrNameBuffer
protected java.lang.StringBuilder currentValueBuffer
protected java.lang.String currentTagName
protected java.lang.String currentAttrName
protected java.util.List<AttrNameValue> currentAttrList
protected boolean inAmpersand
protected java.lang.StringBuilder ampBuffer
protected static final java.util.Map<java.lang.String,java.lang.String> mapLookup
public boolean dealWithCharacter(char thisChar)
throws ManifoldCFException
dealWithCharacter in class SingleCharacterReceiverManifoldCFExceptionprotected boolean acceptNewTag()
protected java.lang.StringBuilder newBuffer()
protected boolean outputAmpBuffer()
throws ManifoldCFException
ManifoldCFExceptionprotected boolean dumpValues(java.lang.String value)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteTag(java.lang.String tagName,
java.util.List<AttrNameValue> attributes)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteEndTag(java.lang.String tagName)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteQTag(java.lang.String tagName,
java.util.List<AttrNameValue> attributes)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteBTag(java.lang.String tagName)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteEndBTag()
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteEscaped(java.lang.String token)
throws ManifoldCFException
token - may be empty!!!ManifoldCFExceptionprotected boolean noteEndEscaped()
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteBTagToken(java.lang.String token)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteNormalCharacter(char thisChar)
throws ManifoldCFException
ManifoldCFExceptionprotected boolean noteEscapedCharacter(char thisChar)
throws ManifoldCFException
ManifoldCFExceptionprotected static java.lang.String attributeDecode(java.lang.String input)
protected static java.lang.String mapChunk(java.lang.String input)
protected static boolean isWhitespace(char x)
protected static boolean isPunctuation(char x)