public class XmlReader extends XmlTokenizer
ContentHandler
).
Note: While in the SAX 2.0 spirit, this implementation is not fully compliant. Speed and footprint took precedence over what the author judged being details.
Unlike SAX, reporting tag names, like in
ContentHandler.startElement(int, totalcross.xml.AttributeList)
, passes an integral
tag code
rather than the name itself. This
is, again, for performance reasons. Comparing integers vs. strings is
notably more efficient and tag name comparison is heavily used for XML
applications.
The tag code
must uniquely identify the name of the
tag. The default implementation — see getTagCode(byte[], int, int)
in
this code — simply consists to hash the tag name. It can be
overriden to suit specific needs.
Tag names should be translated to tag codes as soon as they are known, when reading the DTD for instance, or computed in advance and saved into a static correspondence table.
Modifier and Type | Field and Description |
---|---|
protected java.lang.String |
tagName
String of the current tag name, set by
foundStartTagName . |
protected int |
tagNameHashId
hash ID of current tag name, set by
foundStartTagName or
foundEndTagName |
Constructor and Description |
---|
XmlReader() |
Modifier and Type | Method and Description |
---|---|
void |
foundAttributeName(byte[] buffer,
int offset,
int count)
Override of XmlTokenizer
|
void |
foundAttributeValue(byte[] buffer,
int offset,
int count,
byte dlm)
Override of XmlTokenizer
|
void |
foundCharacter(char charFound)
Override of XmlTokenizer
Impl Note: this assumes the found character is encoded in ISO 8859-1
later, we will need the appropriate encoder
|
void |
foundCharacterData(byte[] buffer,
int offset,
int count)
Override of XmlTokenizer
|
void |
foundComment(byte[] buffer,
int offset,
int count)
Override of XmlTokenizer
|
protected void |
foundDeclaration(byte[] input,
int offset,
int count)
Override of XmlTokenizer
|
void |
foundEndEmptyTag()
Override of XmlTokenizer
|
void |
foundEndOfInput(int count)
Override of XmlTokenizer
|
void |
foundEndTagName(byte[] buffer,
int offset,
int count)
Override of XmlTokenizer
|
void |
foundStartTagName(byte[] buffer,
int offset,
int count)
Override of XmlTokenizer
|
ContentHandler |
getContentHandler()
Return the current content cntHandler.
|
protected int |
getTagCode(byte[] b,
int offset,
int count)
Method to compute the tag code identifying a tag name.
|
void |
parse(byte[] input,
int offset,
int count)
Parse XML data from an array of bytes, offset and count.
|
void |
parse(Stream input)
Parse an XML document from a Stream.
|
void |
parse(Stream input,
byte[] buffer,
int start,
int end,
int pos)
Parse an XML document from an already buffered stream.
|
void |
parse(XmlReadable input)
Parse an XmlReadable
Impl.
|
AttributeList.Filter |
setAttributeListFilter(AttributeList.Filter filter)
Set an AttributeList.Filter to filter the attribute entered in the
AttributeList
|
void |
setCaseInsensitive(boolean caseInsensitive)
Set to true if you want the get/set methods of the AttributeList to be case insensitive.
|
void |
setContentHandler(ContentHandler cntHandler)
Allow an application to register a content event cntHandler.
|
void |
setNewlineSignificant(boolean val)
Enable or disable coalescing white spaces, according to HTML rules.
|
disableReferenceResolution, foundInvalidData, foundProcessingInstruction, foundReference, foundStartOfInput, getAbsoluteOffset, hashCode, isDataCDATA, resolveCharacterReference, setCdataContents, setStrictlyXml, tokenize, tokenize, tokenize, tokenize
protected int tagNameHashId
foundStartTagName
or
foundEndTagName
protected java.lang.String tagName
foundStartTagName
.public void setCaseInsensitive(boolean caseInsensitive)
public void setContentHandler(ContentHandler cntHandler)
If the application does not register a content cntHandler, all content events reported by the SAX parser will be silently ignored.
Applications may register a new or different cntHandler in the middle of a parse, and the SAX parser must begin using the new cntHandler immediately.
cntHandler
- The content cntHandler.java.lang.NullPointerException
- If the cntHandler argument is null.getContentHandler()
public AttributeList.Filter setAttributeListFilter(AttributeList.Filter filter)
filter
- AttributeList.Filter to set, or null if the current
AttributeList filter must be removedpublic ContentHandler getContentHandler()
setContentHandler(totalcross.xml.ContentHandler)
public final void parse(Stream input) throws SyntaxException, IOException
The application can use this method to instruct the XML reader to begin parsing an XML document from reading a Stream.
Here is the general contract for all parse
methods.
Applications may not invoke this method while a parse is in progress (they should create a new XMLReader instead for each nested XML document). Once a parse is complete, an application may reuse the same XMLReader object, possibly with a different input source.
During the parse, the XMLReader will provide information about the XML document through the registered event handlers.
This method is synchronous: it will not return until the parsing has ended. If a client application wants to terminate the parsing early, it should throw an exception.
input
- The input source for the top-level XML document.SyntaxException
IOException
setContentHandler(totalcross.xml.ContentHandler)
public final void parse(Stream input, byte[] buffer, int start, int end, int pos) throws SyntaxException, IOException
Unlike the general method above, this method requires more arguments. It should be used when the HTML document is embedded within an HTTP stream.
See the general contract of parse(Stream)
.
input
- stream to parsebuffer
- buffer, already filled with bytes read from the input streamstart
- starting position in the bufferend
- ending position in the bufferpos
- read position of the byte at offset 0 in the bufferSyntaxException
IOException
public final void parse(XmlReadable input) throws SyntaxException, IOException
input
- The input source for the top-level XML document.IOException
SyntaxException
public final void parse(byte[] input, int offset, int count) throws SyntaxException
See the general contract of parse(Stream)
.
input
- byte array to parseoffset
- position of the first byte in the arraycount
- number of bytes to parseSyntaxException
public void setNewlineSignificant(boolean val)
White spaces are any character less or equal to the ascii space (0x20).
This method allows to process the contents of pre-formatted lines, such as the contents of the <PRE> tag. When the parsing process starts, newlines are not significant. Hence, setNewLineSignificant must be called after the parsing has started. For example, to make all newlines significant:
class MyXmlReader extends XmlReader { public void foundStartOfInput(byte input[], int offset, int count) { setNewLineSignificant(true); } }
Note: this is a "stacked" call.
setNewlineSignificant(true); // newlines are significant - stack is 1 setNewlineSignificant(true); // newlines are significant - stack is 2 setNewlineSignificant(false); // newlines are still significant - stack is 1 setNewlineSignificant(false); // newlines are no more significant again - stack is 0
val
- true if newline characters must be significant, false if they
must be collapsed according to HTML rules.protected int getTagCode(byte[] b, int offset, int count)
This is the value which is passed to ContentHandler's for reporting a tag name. Derived class may override it. Impl Note: Transforming to uppercase takes into account that the bytes are in the range [0-9A-Za-z]: (ch >= 'a') means "ch is a lower case letter". Also, we *do* know that the count is > 0.
b
- byte array containing the bytes to be hashedoffset
- position of the first byte in the arraycount
- number of bytes to be hashedpublic void foundStartTagName(byte[] buffer, int offset, int count)
foundStartTagName
in class XmlTokenizer
buffer
- byte array containing the name of the tag that startedoffset
- position of the first character of the tag name in the arraycount
- number of bytes the tag name is made ofpublic void foundEndTagName(byte[] buffer, int offset, int count)
foundEndTagName
in class XmlTokenizer
buffer
- byte array containing the name of the tag that endedoffset
- position of the first character of the tag name in the arraycount
- number of bytes the tag name is made ofpublic final void foundEndEmptyTag()
foundEndEmptyTag
in class XmlTokenizer
public final void foundCharacterData(byte[] buffer, int offset, int count)
foundCharacterData
in class XmlTokenizer
buffer
- byte array containing the character data that was foundoffset
- position of the first character data in the arraycount
- number of bytes the character data content is made ofpublic final void foundCharacter(char charFound)
foundCharacter
in class XmlTokenizer
charFound
- resolved character - if the character is invalid, this value
is set to '\uffff', which is not a unicode character.XmlTokenizer.foundReference(byte[],int,int)
public final void foundAttributeName(byte[] buffer, int offset, int count)
foundAttributeName
in class XmlTokenizer
buffer
- byte array containing the attribute nameoffset
- position of the first character of the attribute name in the
arraycount
- number of bytes the attribute name is made ofpublic final void foundAttributeValue(byte[] buffer, int offset, int count, byte dlm)
foundAttributeValue
in class XmlTokenizer
buffer
- byte array containing the attribute valueoffset
- position of the first character of the attribute value in the
arraycount
- number of bytes the attribute value is made ofdlm
- delimiter that started the attribute value (' or "). '\0' if
nonepublic final void foundComment(byte[] buffer, int offset, int count)
foundComment
in class XmlTokenizer
buffer
- byte array containing the comment (without the
<!--
and -->
delimiters)offset
- position of the first character of the comment in the arraycount
- number of bytes the comment is made ofpublic final void foundEndOfInput(int count)
foundEndOfInput
in class XmlTokenizer
count
- number of bytes parsedprotected void foundDeclaration(byte[] input, int offset, int count)
foundDeclaration
in class XmlTokenizer
input
- byte array containing the declaration (without the
<!
and >
delimiters)offset
- position of the first character of the declaration in the
arraycount
- number of bytes the declaration is made of