Xmtextreader C# to Read Rss Feed Pubdate to Current Time
- Download source files - 32.8 KB
Introduction
The XmlTextReader
form is not the most intuitive class to work with, equally the methods and backdrop are very low level. While the XmlTextReader
class is rich in backdrop and methods, I've found that virtually of what it provides isn't necessary for the average 24-hour interval-to-mean solar day job. And then, in this article, I'm going to nowadays a moderately thin wrapper grade for the XmlTextReader
, which should be a helpful guide to using the XmlTextReader
for programmers not familiar with this class. This commodity is besides an introduction to a variety of other disciplines that I feel a beginner should be aware of--code commenting, abstraction and compages, and unit of measurement tests. Then, hopefully, there'due south something hither for everybody!
Why Use an XmlReader?
To summarize from this site:
- Use
XmlTextReader
if you demand performance without schema validation or XPath/XSLT services. - Utilise
XmlDocument
if you need XPath services or need to update the XML.
Naturally, XmlTextReader
is closer to the XML. It is a frontward simply reader, pregnant that you tin can't go backwards. Contrast this with the XmlDocument
class, which pulls in the entire document and allows you to random-access the XML tree. The XmlTextReader
supports streams, and therefore should reduce memory requirements for very large documents.
Another advantage of the XmlTextReader
is that it provides line and grapheme position information, which tin can exist very useful diagnostic information when at that place's a problem with the XML.
What Features Do I Desire to Back up?
There's a core set of features in XML that I desire my reader to support:
- XML declarations
- Elements
- Attributes
- Namespaces
- Element prefixes (both global and local)
- Attribute prefixes
- CDATA blocks
- Inner text
- Processing instructions
- Node graphs (element children)
There are other features of XML, merely these are the most common ones, and the ones I want to start with. The link cited above demonstrates a somewhat unlike approach than I am doing here, and it's useful to briefly talk over the difference. In the SoftArtisans link, many of the code snippets demonstrate looking for and evaluating specific elements and attributes of the XML, including optional ones. In other words, the application has expectations regarding the XML graph and content. The reader that I am presenting is tailored more to processing ad hoc XML, where there are no expectations regarding the graph and the content. Both approaches have there value depending on what y'all demand to get accomplished.
The Unit Tests
The unit tests are written for my Advanced Unit of measurement Examination application, downloadable hither. The reason I'chiliad using my unit test application instead of NUnit is because I want to accept reward of AUT's ability to execute tests in sequence, as I read through the XML. Aye, I could have instead written an XML fragment for each unit test, but I notice this more user-friendly and more realistic, every bit I can piece of work with the unabridged XML certificate.
The XML Examination Certificate
Here's the XML test certificate, which illustrates each of the features described to a higher place:
=" one.0" =" utf-8" < RootNode AnAttribute =" AttributeValue" xmlns:foo =" Some namespace" > < ChildNode Attr1 =" 1" Attr2 =" 2" Attr3 =" 3" / > < bar:Detail xmlns:bar =" LocalBar" / > < AttributeNamespace foo:MyAttribute =" 10" / > <![CDATA[ < Element >Text< /Element > < ChildNode > < GrandchildNode Depth =" 3" / > < /ChildNode > < /RootNode >
The Reader Compages
Something this elementary doesn't demand an compages, does it? In fact, it does. Fifty-fifty with something this unproblematic, it'southward a adept idea to consider what abstraction you might want (planning for the future) and helper objects that will brand agreement and working with the code easier. And of course, we need to consider what kind of exceptions the reader volition throw. Every bit a side comment, it e'er surprises me how a practiced architecture, fifty-fifty for the simplest of functionality, practically eliminates monolithic code and helps to create dainty pocket-sized methods that are easily unit tested.
The IReaderInterface
I potentially want to read formats other than XML, while staying inside the constraints of an XML-ish structure. For example, a comma separated value file (CSV) is a good candidate for an culling reader implementation. By abstracting the reader, I can back up alternative formats without having to change the code that uses the reader. This is a design determination that is all-time made early on.
The reader implements an IReader
interface that provides the necessary brainchild layer:
public interface IReader { bool IgnoreEndElements { get; ready;} int AttributeCount { become;} int Depth { get;} string CData { get;} string Annotate { go;} cord Text { get;} LineInfo CurrentLineInfo { get;} ElementTagInfo Element { get;} AttributeTagInfo Attribute { get;} ProcessingInstructionInfo Instruction { go;} XmlNodeType NodeType { become;} XmlNodeType ReadNode(); AttributeTagInfo ReadFirstAttribute(); AttributeTagInfo ReadNextAttribute(); AttributeTagInfo ReadAttribute(); }
Since this is a beginning article, I want to emphasize something here--there is no excuse for not putting in at least basic comments in your code. None. Information technology is a discipline that I myself have worked hard to achieve, but if you're writing a professional application that y'all or others may one day need to maintain, you simply have to force yourself to become disciplined most writing comments.
The interface:
- Abstracts the reading of nodes and attributes.
- Defines the methods and properties that make it clearer as to what is beingness read, rather than using the
XmlTextReader
'sText
andValue
backdrop
Anyone interested in implementing a custom reader now knows what the custom reader needs to implement. An awarding needing a reader can at present reference the reader via the IReader
interface, and a factory pattern can be used to instantiate the appropriate reader.
The Container Classes
At that place are several container classes that help encapsulate information relevant to all nodes and relevant to specific nodes. Creating classes that encapsulate fields improves code readability and provides a layer of separation from the underlying implementation (the Reader
class, in this case). And no, none of the container classes are unit tested--yous have to draw the line somewhere, and these classes are much as well simple to spend the time on unit of measurement testing.
LineInfo
All XML nodes have line and character position information, which is encapsulated in the LineInfo
class:
public class LineInfo { protected int lineNumber; protected int linePosition; public int LinePosition { get { return linePosition; } } public int LineNumber { get { return lineNumber; } } public LineInfo(int lineNumber, int linePosition) { this.lineNumber = lineNumber; this.linePosition = linePosition; } }
Since this class is instantiated strictly by the reader, the properties are read-only.
NodeInfo
NodeInfo
is an abstract grade that encapsulates the two mutual elements of just nearly every XML node (at that place are a few exceptions): the node name and the node prefix.
public abstruse class NodeInfo { protected string name; protected cord prefix; protected LineInfo lineInfo; public LineInfo LineInfo { become { return lineInfo; } } public string Prefix { become { render prefix; } } public string Name { get { return name; } } public NodeInfo(LineInfo lineInfo, string prefix, string name) { this.lineInfo = lineInfo; this.prefix = prefix; this.name = proper name; } }
Information technology'due south an abstract class because nosotros want to make sure that the implementation utilizes an advisable concrete class derived from NodeInfo
. The concrete implementation improves readability (since it qualifies the type of node information), and usually provides additional fields specific to the node type.
ElementNodeInfo
This class is a physical implementation of NodeInfo
, and adds a local namespace field, as elements can have local namespaces:
public grade ElementNodeInfo : NodeInfo { protected cord localNamespace; public cord LocalNamespace { get { return localNamespace; } } public ElementNodeInfo(LineInfo lineInfo, string prefix, cord name, string namespaceUri) : base(lineInfo, prefix, name) { localNamespace = namespaceUri; } }
AttributeNodeInfo
This class is a physical implementation of NodeInfo
, and adds a value field, every bit attributes have values:
public class AttributeNodeInfo : NodeInfo { protected string val; public cord Value { get { return val; } } public AttributeNodeInfo(LineInfo lineInfo, string prefix, string name, string val) : base(lineInfo, prefix, proper name) { this.val = val; } }
ProcessingInstructionInfo
This course derives from AttributeNodeInfo
. A processing instruction has a name and a value, similar an attribute, simply I've implemented a carve up class to represent the concept of a processing pedagogy, fifty-fifty though it does not extend the AttributeNodeInfo
class. This is but a lawmaking readability decision.
public class ProcessingInstructionInfo : AttributeNodeInfo { public ProcessingInstructionInfo(LineInfo lineInfo, cord name, cord value) : base of operations(lineInfo, String.Empty, proper noun, value) { } }
The XmlTextReader
Instead of talking about the XmlTextReader
as a grade and its methods, which y'all can easily read most yourself, I'thou going to show you the XmlTextReader
within the context of my Reader
wrapper. This style, instead of simply looking at documentation, you'll see the XmlTextReader
in actual code, and I'll explicate what I'm doing in the code and why.
Creating an XmlTextReader
Quite literally, the first stumbling block is in creating an XmlTextReader
. It sounds simple, just according to Microsoft:
In the Microsoft .Cyberspace Framework version ii.0 release, the recommended practice is to create
XmlReader
instances using theCreate
method. This allows you lot to take full advantage of the new features introduced in this release.
Second, I want to control some aspects of the reading process, specifically, I almost ever want to ignore whitespace. The default XmlTextReader
returns all whitespace. And then, to properly construct an XmlTextReader
using Microsoft's recommended method and to have the ability to set some options, we have to do something similar this:
public Reader(string xml) { StringReader textStream = new StringReader(xml); XmlReaderSettings settings = new XmlReaderSettings(); settings.IgnoreComments = false; settings.IgnoreWhitespace = true; xtr = new XmlTextReader(textStream); reader = XmlReader.Create(xtr, settings); firstAttribute = truthful; }
Before I go farther, this constructor is the ane I use for the unit of measurement tests, and it takes an XML string. You might instead desire a constructor that takes a stream, and equally you lot can see in the first line, I create a StringReader
stream.
The second line creates an XmlReaderSettings
instance, and I explicitly (just to prove y'all another useful property) choose non to ignore comments, but I do want to ignore whitespace. Next, I create the XmlTextReader
from the stream. Only that'south not plenty. I at present have to create an XmlReader
, passing in the XmlTextReader
and the desired settings. At present, nosotros have properly constructed a reader, complying with Microsoft's guidelines, and having the ability to configure the reader to ignore whitespace.
If you lot're wondering about the last line, we'll get to that later.
The Constructor Unit Test
[Test, Sequence(0)] public void ConstructorTest() { reader = new Reader(UnitTestResources.ReaderTest); reader.IgnoreEndElements = truthful; Assertion.Assert(reader.NodeType == XmlNodeType.None, " Expected 'None' for the node type."); }
The constructor reveals the fact that the XmlTextReader
does not position itself on a valid node immediately subsequently construction, as the NodeType
is "None
".
Reading the XML Annunciation
Reading the XML declaration, every bit with all other elements, requires calling ReadNode
:
public XmlNodeType ReadNode() { do { reader.Read(); } while (ignoreEndElements && (NodeType == XmlNodeType.EndElement)); firstAttribute = true; return reader.NodeType; }
My wrapper for the reader optionally skips end elements. If you don't do this, the reader volition return EndElement
node types, which, depending on what you are doing with the XML, may be superfluous. In the unit of measurement test constructor, this flag is set to true
.
The XML Declaration Unit Test
[Test, Sequence(ane)] public void XmlDeclarationTest() { reader.ReadNode(); Assertion.Affirm(reader.NodeType == XmlNodeType.XmlDeclaration, " Expected xml declaration node type."); Assertion.Assert(reader.AttributeCount==two, " Expected 2 attributes."); AttributeNodeInfo ati1 = reader.ReadFirstAttribute(); AttributeNodeInfo ati2 = reader.ReadNextAttribute(); Assertion.Affirm(ati1.Proper name == " version", " Expected version attribute."); Assertion.Assert(ati1.Value == " 1.0", " Expected version number."); Assertion.Assert(ati2.Name == " encoding", " Expected encoding attribute."); Exclamation.Affirm(ati2.Value == " utf-viii", " Expected encoding value."); }
An XML declaration contains attributes only like an element node. I'll demonstrate the ReadFirstAttribute
and ReadNextAttribute
shortly.
Reading the Root Node and Other Elements
Immediately post-obit the XML declaration should be the root node. My reader provides an Chemical element
holding which returns an ElementNodeInfo
case that encapsulates the chemical element proper name, prefix, and optional namespace. Looking at the implementation:
public ElementNodeInfo Element { get { if (NodeType != XmlNodeType.Chemical element) { throw new ReaderException(" Not on an chemical element node."); } ElementNodeInfo el = new ElementNodeInfo(CurrentLineInfo, reader.Prefix, NameWithoutPrefix, reader.NamespaceURI); return el; } }
You'll see that the ElementNodeInfo
also consists of the reader'due south line and character position, and the element name is stripped of the prefix.
Reading the Root Node Unit Test
Reading an element node is straightforward, as the unit test demonstrates:
[Test, Sequence(2)] public void RootNodeTest() { reader.ReadNode(); Assertion.Assert(reader.NodeType == XmlNodeType.Chemical element, " Expected chemical element node type."); ElementNodeInfo eti = reader.Element; Exclamation.Assert(eti.Name == " RootNode", " Expecte root node element."); Assertion.Assert(eti.Prefix == " ", " Expected a blank prefix."); Assertion.Assert(reader.AttributeCount == ii, " Expected 2 attributes."); }
The ReadNode
method is chosen to move past the XML declaration node and onto the root node. The unit examination verifies that this happened correctly.
There's some other chemical element test later, which tests that a local namespace has been correctly read:
[Examination, Sequence(8)] public void LocalNamespaceTest() { reader.ReadNode(); Assertion.Assert(reader.NodeType == XmlNodeType.Chemical element, " Expected element node blazon."); ElementNodeInfo eti = reader.Element; Assertion.Assert(eti.Prefix == " bar", " Unexpected prefix."); Assertion.Assert(eti.LocalNamespace == " LocalBar", " Unexpected namespace."); }
Reading Attributes
Most XML elements contain attributes, and the root node includes two attributes, one of which is an XML namespace announcement. The XmlTextReader
provides two methods for reading an attribute, MoveToFirstAttribute
and MoveToNextAttribute
, which return a boolean true
if successful, false
otherwise. I've modified this implementation slightly:
public AttributeNodeInfo ReadFirstAttribute() { bool val = xtr.MoveToFirstAttribute(); AttributeNodeInfo ret = zip; if (val) { ret = Aspect; firstAttribute = faux; } render ret; }
and:
public AttributeNodeInfo ReadNextAttribute() { bool val=xtr.MoveToNextAttribute(); AttributeNodeInfo ret = goose egg; if (val) { ret = Attribute; } return ret; }
Both of these methods return an AttributeNodeInfo
instance, encapsulating the reader'southward line and graphic symbol position and the attribute name, prefix, and value. A cipher
is returned if there are no farther attributes to read. Y'all can utilise these methods, or you can apply another method that avoids having to effigy out whether to telephone call ReadFirstAttribute
or ReadNextAttribute
. My reader figures this out automatically for you, and here's where the firstAttribute
boolean comes into play:
public AttributeNodeInfo ReadAttribute() { AttributeNodeInfo ret = null; if (firstAttribute) { ret = ReadFirstAttribute(); } else { ret = ReadNextAttribute(); } return ret; }
The firstAttribute
flag is set whenever ReadNode
is called. It'south cleared when the outset attribute is read, either by calling ReadFirstAttribute
or ReadAttribute
.
The Attribute Unit Tests
The following sequence of unit of measurement tests test the first, adjacent, and "smart" attribute reader:
[Examination, Sequence(3)] public void ReadFirstAttributeTest() { AttributeNodeInfo ati = reader.ReadFirstAttribute(); Assertion.Assert(ati.Proper noun == " AnAttribute", " Unexpected first attribute name."); Assertion.Assert(ati.Value==" AttributeValue", " Unexpected get-go attribute value."); } [Test, Sequence(4)] public void ReadNextAttributeTest() { AttributeNodeInfo ati = reader.ReadNextAttribute(); Assertion.Assert(ati.Name == " foo", " Unexpected 2d attribute name."); Assertion.Assert(ati.Prefix == " xmlns", " Unexpected second attribute prefix."); Assertion.Assert(ati.Value == " Some namespace", " Unexpected second attribute value."); } [Examination, Sequence(5)] public void NoFurtherAttributeTest() { AttributeNodeInfo ati=reader.ReadNextAttribute(); Assertion.Assert(ati == nothing, " Expected a naught after the last attribute."); } [Test, Sequence(half-dozen)] public void ReadNextElement() { reader.ReadNode(); ElementNodeInfo eti = reader.Element; Exclamation.Affirm(eti.Name == " ChildNode", " Unexpected element."); Assertion.Assert(reader.Depth == 1, " Unexpected depth."); } [Test, Sequence(vii)] public void SmartAttributeReaderTest() { int i = 0; while (reader.ReadAttribute() != null) { ++i; } Assertion.Assert(i == 3, " Expected iii attributes."); } [Examination, Sequence(nine)] public void AttributePrefixTest() { reader.ReadNode(); AttributeNodeInfo ati=reader.ReadAttribute(); Exclamation.Assert(ati.Prefix == " foo", " Unexpected prefix."); Assertion.Affirm(ati.Name == " MyAttribute", " Unexpected attribute."); Assertion.Assert(ati.Value == " 10", " Unexpected value."); }
Reading CDATA
A CDATA block lets y'all include freeform text in the XML, such as lawmaking. My reader provides a CData
property which returns the CDATA text every bit a string:
public string CData { get { if (NodeType != XmlNodeType.CDATA) { throw new ReaderException(" Non on a CDATA node."); } render reader.Value; } }
As you lot tin can come across, the CData
holding validates the node type that wraps the Value
holding.
The CDATA Unit Test
[Exam, Sequence(10)] public void CDATATest() { reader.ReadNode(); Assertion.Assert(reader.NodeType == XmlNodeType.CDATA, " Expected CDATA node."); Assertion.Assert(reader.CData == " some stuff", " Unexpected CDATA text."); }
Reading Comments
Reading XML comments is just like getting the CDATA text. One time we know that the node is a comment node, we return the Value
belongings which contains the comment text. The reader also trims any leading and trailing whitespace, which is often used to make the XML comments more than readable.
public cord Comment { get { if (NodeType != XmlNodeType.Comment) { throw new ReaderException(" Not on a comment node."); } render reader.Value.Trim(); } }
The Comment Unit Test
[Test, Sequence(eleven)] public void CommentTest() { reader.ReadNode(); Assertion.Assert(reader.NodeType == XmlNodeType.Comment, " Expected annotate node."); Assertion.Assert(reader.Comment == " My comment", " Unexpected annotate text."); }
Reading Inner Chemical element Text
Equally the XmlTextReader
moves through the XML, any inner text is its own Text
node type. The reader's Text
property is a thin wrapper for the XmlTextReader
'due south Value
property:
public string Text { get { if (NodeType != XmlNodeType.Text) { throw new ReaderException(" Not on a text node."); } return reader.Value; } }
The Text Unit Test
[Test, Sequence(12)] public void TextTest() { reader.ReadNode(); Assertion.Assert(reader.NodeType == XmlNodeType.Element, " Expected chemical element."); reader.ReadNode(); Exclamation.Assert(reader.NodeType == XmlNodeType.Text, " Expected text node."); Exclamation.Assert(reader.Text == " Text", " Unexpected text value."); }
Reading Process Instructions
Process instructions are another kind of XML nodes. These may comprise useful meta-instructions for the engine that is processing the XML. The reader provides a thin wrapper for getting the process teaching:
public ProcessingInstructionInfo Education { become { if (NodeType != XmlNodeType.ProcessingInstruction) { throw new ReaderException(" Not on a processing instruction node."); } ProcessingInstructionInfo proc = new ProcessingInstructionInfo(CurrentLineInfo, reader.Name, reader.Value); return proc; } }
Procedure Didactics Unit Test
[Exam, Sequence(13)] public void ReadProcessingInstructionTest() { reader.ReadNode(); Assertion.Affirm(reader.NodeType == XmlNodeType.ProcessingInstruction, " Expected processing educational activity."); ProcessingInstructionInfo pii = reader.Instruction; Assertion.Affirm(pii.Proper name == " do", " Unexpected name."); Assertion.Assert(pii.Value == " homework", " Unexpected value."); }
Working with the XML Graph
Lastly, one of the important things nigh XML is that it is hierarchical. The reader provides a thin wrapper to the XmlTextReader
's Depth
property (a very thin wrapper):
public int Depth { get { return reader.Depth; } }
The point being though that we need this belongings implemented by any grade that realizes IReader
.
The Depth Unit Exam
[Examination, Sequence(fourteen)] public void DepthTests() { reader.ReadNode(); Exclamation.Assert(reader.Depth == 1, " Depth should exist i."); reader.ReadNode(); Exclamation.Assert(reader.Depth == two, " Depth should be 2."); reader.ReadAttribute(); Assertion.Assert(reader.Depth == 3, " Depth should exist 3."); reader.ReadNode(); Exclamation.Assert(reader.Depth == 0, " Depth should be 0."); }
This unit test reveals one of the side-effects of ignoring the XML end element node type, which is that the depth tin can pop several levels. This should be taken into consideration when writing an application that actually does something with the XML.
Well-nigh the Download
I've created a solution that contains the post-obit projects:
- Reader, consisting just of the Reader.cs file. This is the concrete implementation of the
IReader
interface. - ReaderCommon, consisting of the
IReader
interface and the container classes. - ReaderUnitTests, consisting of the unit tests.
- UnitTestLib, the files necessary for compiling the to a higher place unit test project.
These comprise all the pieces necessary to compile the projection without error. To really run the unit of measurement tests, you'll need to download the Advanced Unit Test application mentioned earlier.
Wrapping it Upward
The Reader is a adequately thin wrapper effectually the XmlTextReader
class. The class is intended to be used past an application that reads nodes, inspects the node type, and determines what to do given the node type. This is an ad-hoc approach to reading XML, whereas the SoftArtisans link provided at the beginning of the commodity shows a more directed approach in which the application is expecting a sure format to the XML. Hopefully though, this article and the SoftArtisans link provides you with a ameliorate understanding of how to work with the XmlTextReader
. The reader also provides an abstraction that decouples the application from the specific document type, which is one of the goals that I had in mind.
carterdaithis1970.blogspot.com
Source: https://www.codeproject.com/Articles/15452/The-XmlTextReader-A-Beginner-s-Guide
0 Response to "Xmtextreader C# to Read Rss Feed Pubdate to Current Time"
Post a Comment