Xmtextreader C# to Read Rss Feed Pubdate to Current Time

Download source files - 32.8 KB

Introduction

The XmlTextReader form is not the most intuitive class to work with, equally the methods and backdrop are very low level. While the XmlTextReader class is rich in backdrop and methods, I've found that virtually of what it provides isn't necessary for the average 24-hour interval-to-mean solar day job. And then, in this article, I'm going to nowadays a moderately thin wrapper grade for the XmlTextReader, which should be a helpful guide to using the XmlTextReader for programmers not familiar with this class. This commodity is besides an introduction to a variety of other disciplines that I feel a beginner should be aware of--code commenting, abstraction and compages, and unit of measurement tests. Then, hopefully, there'due south something hither for everybody!

Why Use an XmlReader?

To summarize from this site:

Use XmlTextReader if you demand performance without schema validation or XPath/XSLT services.
Utilise XmlDocument if you need XPath services or need to update the XML.

Naturally, XmlTextReader is closer to the XML. It is a frontward simply reader, pregnant that you tin can't go backwards. Contrast this with the XmlDocument class, which pulls in the entire document and allows you to random-access the XML tree. The XmlTextReader supports streams, and therefore should reduce memory requirements for very large documents.

Another advantage of the XmlTextReader is that it provides line and grapheme position information, which tin can exist very useful diagnostic information when at that place's a problem with the XML.

What Features Do I Desire to Back up?

There's a core set of features in XML that I desire my reader to support:

XML declarations
Elements
Attributes
Namespaces
Element prefixes (both global and local)
Attribute prefixes
CDATA blocks
Inner text
Processing instructions
Node graphs (element children)

There are other features of XML, merely these are the most common ones, and the ones I want to start with. The link cited above demonstrates a somewhat unlike approach than I am doing here, and it's useful to briefly talk over the difference. In the SoftArtisans link, many of the code snippets demonstrate looking for and evaluating specific elements and attributes of the XML, including optional ones. In other words, the application has expectations regarding the XML graph and content. The reader that I am presenting is tailored more to processing ad hoc XML, where there are no expectations regarding the graph and the content. Both approaches have there value depending on what y'all demand to get accomplished.

The Unit Tests

The unit tests are written for my Advanced Unit of measurement Examination application, downloadable hither. The reason I'chiliad using my unit test application instead of NUnit is because I want to accept reward of AUT's ability to execute tests in sequence, as I read through the XML. Aye, I could have instead written an XML fragment for each unit test, but I notice this more user-friendly and more realistic, every bit I can piece of work with the unabridged XML certificate.

The XML Examination Certificate

Here's the XML test certificate, which illustrates each of the features described to a higher place:

          ="          one.0"          ="          utf-8"          <          RootNode                              AnAttribute          ="          AttributeValue"                              xmlns:foo          ="          Some namespace"          >          <          ChildNode                              Attr1          ="          1"                              Attr2          ="          2"                              Attr3          ="          3"          /          >          <          bar:Detail                              xmlns:bar          ="          LocalBar"          /          >          <          AttributeNamespace                              foo:MyAttribute          ="          10"          /          >          <![CDATA[                              <          Element          >Text<          /Element          >                    <          ChildNode          >          <          GrandchildNode                              Depth          ="          3"          /          >          <          /ChildNode          >          <          /RootNode          >

The Reader Compages

Something this elementary doesn't demand an compages, does it? In fact, it does. Fifty-fifty with something this unproblematic, it'southward a adept idea to consider what abstraction you might want (planning for the future) and helper objects that will brand agreement and working with the code easier. And of course, we need to consider what kind of exceptions the reader volition throw. Every bit a side comment, it e'er surprises me how a practiced architecture, fifty-fifty for the simplest of functionality, practically eliminates monolithic code and helps to create dainty pocket-sized methods that are easily unit tested.

The IReaderInterface

I potentially want to read formats other than XML, while staying inside the constraints of an XML-ish structure. For example, a comma separated value file (CSV) is a good candidate for an culling reader implementation. By abstracting the reader, I can back up alternative formats without having to change the code that uses the reader. This is a design determination that is all-time made early on.

The reader implements an IReader interface that provides the necessary brainchild layer:

          public          interface          IReader {                 bool          IgnoreEndElements {          get;          ready;}                  int          AttributeCount {          become;}                      int          Depth {          get;}                  string          CData {          get;}                    string          Annotate {          go;}                  cord          Text {          get;}          LineInfo CurrentLineInfo {          get;}            ElementTagInfo Element {          get;}            AttributeTagInfo Attribute {          get;}          ProcessingInstructionInfo Instruction {          go;}          XmlNodeType NodeType {          become;}          XmlNodeType ReadNode();            AttributeTagInfo ReadFirstAttribute();            AttributeTagInfo ReadNextAttribute();              AttributeTagInfo ReadAttribute(); }

Since this is a beginning article, I want to emphasize something here--there is no excuse for not putting in at least basic comments in your code. None. Information technology is a discipline that I myself have worked hard to achieve, but if you're writing a professional application that y'all or others may one day need to maintain, you simply have to force yourself to become disciplined most writing comments.

The interface:

Abstracts the reading of nodes and attributes.
Defines the methods and properties that make it clearer as to what is beingness read, rather than using the XmlTextReader's Text and Value backdrop

Anyone interested in implementing a custom reader now knows what the custom reader needs to implement. An awarding needing a reader can at present reference the reader via the IReader interface, and a factory pattern can be used to instantiate the appropriate reader.

The Container Classes

At that place are several container classes that help encapsulate information relevant to all nodes and relevant to specific nodes. Creating classes that encapsulate fields improves code readability and provides a layer of separation from the underlying implementation (the Reader class, in this case). And no, none of the container classes are unit tested--yous have to draw the line somewhere, and these classes are much as well simple to spend the time on unit of measurement testing.

LineInfo

All XML nodes have line and character position information, which is encapsulated in the LineInfo class:

          public          class          LineInfo {          protected          int          lineNumber;          protected          int          linePosition;                  public          int          LinePosition   {          get          {          return          linePosition; }   }                  public          int          LineNumber   {          get          {          return          lineNumber; }   }                  public          LineInfo(int          lineNumber,          int          linePosition)   {          this.lineNumber = lineNumber;          this.linePosition = linePosition;   } }

Since this class is instantiated strictly by the reader, the properties are read-only.

NodeInfo

NodeInfo is an abstract grade that encapsulates the two mutual elements of just nearly every XML node (at that place are a few exceptions): the node name and the node prefix.

          public          abstruse          class          NodeInfo {          protected          string          name;          protected          cord          prefix;          protected          LineInfo lineInfo;                  public          LineInfo LineInfo   {          become          {          return          lineInfo; }   }                  public          string          Prefix   {          become          {          render          prefix; }   }                  public          string          Name   {          get          {          return          name; }   }                        public          NodeInfo(LineInfo lineInfo,          string          prefix,          string          name)   {          this.lineInfo = lineInfo;          this.prefix = prefix;          this.name = proper name;   } }

Information technology'due south an abstract class because nosotros want to make sure that the implementation utilizes an advisable concrete class derived from NodeInfo. The concrete implementation improves readability (since it qualifies the type of node information), and usually provides additional fields specific to the node type.

ElementNodeInfo

This class is a physical implementation of NodeInfo, and adds a local namespace field, as elements can have local namespaces:

          public          grade          ElementNodeInfo : NodeInfo {          protected          cord          localNamespace;                  public          cord          LocalNamespace   {          get          {          return          localNamespace; }   }          public          ElementNodeInfo(LineInfo lineInfo,          string          prefix,          cord          name,          string          namespaceUri)   :          base(lineInfo, prefix, name)   {     localNamespace = namespaceUri;   } }

AttributeNodeInfo

This class is a physical implementation of NodeInfo, and adds a value field, every bit attributes have values:

          public          class          AttributeNodeInfo : NodeInfo {          protected          string          val;                  public          cord          Value   {          get          {          return          val; }   }          public          AttributeNodeInfo(LineInfo lineInfo,          string          prefix,          string          name,          string          val)   :          base(lineInfo, prefix, proper name)   {          this.val = val;   } }

ProcessingInstructionInfo

This course derives from AttributeNodeInfo. A processing instruction has a name and a value, similar an attribute, simply I've implemented a carve up class to represent the concept of a processing pedagogy, fifty-fifty though it does not extend the AttributeNodeInfo class. This is but a lawmaking readability decision.

          public          class          ProcessingInstructionInfo : AttributeNodeInfo {          public          ProcessingInstructionInfo(LineInfo lineInfo,          cord          name,          cord          value)   :          base of operations(lineInfo,          String.Empty, proper noun,          value)   {   } }

The XmlTextReader

Instead of talking about the XmlTextReader as a grade and its methods, which y'all can easily read most yourself, I'thou going to show you the XmlTextReader within the context of my Reader wrapper. This style, instead of simply looking at documentation, you'll see the XmlTextReader in actual code, and I'll explicate what I'm doing in the code and why.

Creating an XmlTextReader

Quite literally, the first stumbling block is in creating an XmlTextReader. It sounds simple, just according to Microsoft:

In the Microsoft .Cyberspace Framework version ii.0 release, the recommended practice is to create XmlReader instances using the Create method. This allows you lot to take full advantage of the new features introduced in this release.

Second, I want to control some aspects of the reading process, specifically, I almost ever want to ignore whitespace. The default XmlTextReader returns all whitespace. And then, to properly construct an XmlTextReader using Microsoft's recommended method and to have the ability to set some options, we have to do something similar this:

          public          Reader(string          xml) {   StringReader textStream =          new          StringReader(xml);   XmlReaderSettings settings =          new          XmlReaderSettings();   settings.IgnoreComments =          false;   settings.IgnoreWhitespace =          true;   xtr =          new          XmlTextReader(textStream);   reader = XmlReader.Create(xtr, settings);   firstAttribute =          truthful; }

Before I go farther, this constructor is the ane I use for the unit of measurement tests, and it takes an XML string. You might instead desire a constructor that takes a stream, and equally you lot can see in the first line, I create a StringReader stream.

The second line creates an XmlReaderSettings instance, and I explicitly (just to prove y'all another useful property) choose non to ignore comments, but I do want to ignore whitespace. Next, I create the XmlTextReader from the stream. Only that'south not plenty. I at present have to create an XmlReader, passing in the XmlTextReader and the desired settings. At present, nosotros have properly constructed a reader, complying with Microsoft's guidelines, and having the ability to configure the reader to ignore whitespace.

If you lot're wondering about the last line, we'll get to that later.

The Constructor Unit Test

[Test, Sequence(0)]          public          void          ConstructorTest() {   reader =          new          Reader(UnitTestResources.ReaderTest);   reader.IgnoreEndElements =          truthful;   Assertion.Assert(reader.NodeType == XmlNodeType.None,          "          Expected 'None' for the node type."); }

The constructor reveals the fact that the XmlTextReader does not position itself on a valid node immediately subsequently construction, as the NodeType is "None".

Reading the XML Annunciation

Reading the XML declaration, every bit with all other elements, requires calling ReadNode:

          public          XmlNodeType ReadNode() {          do          {     reader.Read();   }          while          (ignoreEndElements && (NodeType == XmlNodeType.EndElement));    firstAttribute =          true;          return          reader.NodeType; }

My wrapper for the reader optionally skips end elements. If you don't do this, the reader volition return EndElement node types, which, depending on what you are doing with the XML, may be superfluous. In the unit of measurement test constructor, this flag is set to true.

The XML Declaration Unit Test

[Test, Sequence(ane)]          public          void          XmlDeclarationTest() {   reader.ReadNode();    Assertion.Affirm(reader.NodeType == XmlNodeType.XmlDeclaration,          "          Expected xml declaration node type.");   Assertion.Assert(reader.AttributeCount==two,          "          Expected 2 attributes.");   AttributeNodeInfo ati1 = reader.ReadFirstAttribute();   AttributeNodeInfo ati2 = reader.ReadNextAttribute();   Assertion.Affirm(ati1.Proper name ==          "          version",          "          Expected version attribute.");   Assertion.Assert(ati1.Value ==          "          1.0",          "          Expected version number.");   Assertion.Assert(ati2.Name ==          "          encoding",          "          Expected encoding attribute.");   Exclamation.Affirm(ati2.Value ==          "          utf-viii",          "          Expected encoding value."); }

An XML declaration contains attributes only like an element node. I'll demonstrate the ReadFirstAttribute and ReadNextAttribute shortly.

Reading the Root Node and Other Elements

Immediately post-obit the XML declaration should be the root node. My reader provides an Chemical element holding which returns an ElementNodeInfo case that encapsulates the chemical element proper name, prefix, and optional namespace. Looking at the implementation:

          public          ElementNodeInfo Element {          get          {          if          (NodeType != XmlNodeType.Chemical element)     {          throw          new          ReaderException("          Not on an chemical element node.");     }      ElementNodeInfo el =          new          ElementNodeInfo(CurrentLineInfo,                           reader.Prefix, NameWithoutPrefix, reader.NamespaceURI);          return          el;   } }

You'll see that the ElementNodeInfo also consists of the reader'due south line and character position, and the element name is stripped of the prefix.

Reading the Root Node Unit Test

Reading an element node is straightforward, as the unit test demonstrates:

[Test, Sequence(2)]          public          void          RootNodeTest() {   reader.ReadNode();   Assertion.Assert(reader.NodeType == XmlNodeType.Chemical element,          "          Expected chemical element node type.");   ElementNodeInfo eti = reader.Element;   Exclamation.Assert(eti.Name ==          "          RootNode",          "          Expecte root node element.");   Assertion.Assert(eti.Prefix ==          "          ",          "          Expected a blank prefix.");   Assertion.Assert(reader.AttributeCount ==          ii,          "          Expected 2 attributes."); }

The ReadNode method is chosen to move past the XML declaration node and onto the root node. The unit examination verifies that this happened correctly.

There's some other chemical element test later, which tests that a local namespace has been correctly read:

[Examination, Sequence(8)]          public          void          LocalNamespaceTest() {   reader.ReadNode();   Assertion.Assert(reader.NodeType == XmlNodeType.Chemical element,          "          Expected element node blazon.");   ElementNodeInfo eti = reader.Element;   Assertion.Assert(eti.Prefix ==          "          bar",          "          Unexpected prefix.");   Assertion.Assert(eti.LocalNamespace ==          "          LocalBar",          "          Unexpected namespace."); }

Reading Attributes

Most XML elements contain attributes, and the root node includes two attributes, one of which is an XML namespace announcement. The XmlTextReader provides two methods for reading an attribute, MoveToFirstAttribute and MoveToNextAttribute, which return a boolean true if successful, false otherwise. I've modified this implementation slightly:

          public          AttributeNodeInfo ReadFirstAttribute() {          bool          val = xtr.MoveToFirstAttribute();   AttributeNodeInfo ret =          zip;          if          (val)   {     ret = Aspect;     firstAttribute =          faux;   }          render          ret; }

and:

          public          AttributeNodeInfo ReadNextAttribute() {          bool          val=xtr.MoveToNextAttribute();   AttributeNodeInfo ret =          goose egg;          if          (val)   {     ret = Attribute;   }          return          ret; }

Both of these methods return an AttributeNodeInfo instance, encapsulating the reader'southward line and graphic symbol position and the attribute name, prefix, and value. A cipher is returned if there are no farther attributes to read. Y'all can utilise these methods, or you can apply another method that avoids having to effigy out whether to telephone call ReadFirstAttribute or ReadNextAttribute. My reader figures this out automatically for you, and here's where the firstAttribute boolean comes into play:

          public          AttributeNodeInfo ReadAttribute() {   AttributeNodeInfo ret =          null;          if          (firstAttribute)   {     ret = ReadFirstAttribute();   }          else          {     ret = ReadNextAttribute();   }          return          ret; }

The firstAttribute flag is set whenever ReadNode is called. It'south cleared when the outset attribute is read, either by calling ReadFirstAttribute or ReadAttribute.

The Attribute Unit Tests

The following sequence of unit of measurement tests test the first, adjacent, and "smart" attribute reader:

[Examination, Sequence(3)]          public          void          ReadFirstAttributeTest() {   AttributeNodeInfo ati = reader.ReadFirstAttribute();   Assertion.Assert(ati.Proper noun ==          "          AnAttribute",          "          Unexpected first attribute name.");   Assertion.Assert(ati.Value=="          AttributeValue",          "          Unexpected get-go attribute value."); }  [Test, Sequence(4)]          public          void          ReadNextAttributeTest() {   AttributeNodeInfo ati = reader.ReadNextAttribute();      Assertion.Assert(ati.Name ==          "          foo",          "          Unexpected 2d attribute name.");   Assertion.Assert(ati.Prefix ==          "          xmlns",          "          Unexpected second attribute prefix.");   Assertion.Assert(ati.Value ==          "          Some namespace",          "          Unexpected second attribute value."); }  [Examination, Sequence(5)]          public          void          NoFurtherAttributeTest() {   AttributeNodeInfo ati=reader.ReadNextAttribute();   Assertion.Assert(ati ==          nothing,          "          Expected a naught after the last attribute."); }  [Test, Sequence(half-dozen)]          public          void          ReadNextElement() {   reader.ReadNode();   ElementNodeInfo eti = reader.Element;   Exclamation.Affirm(eti.Name ==          "          ChildNode",          "          Unexpected element.");   Assertion.Assert(reader.Depth ==          1,          "          Unexpected depth."); }  [Test, Sequence(vii)]          public          void          SmartAttributeReaderTest() {          int          i =          0;          while          (reader.ReadAttribute() !=          null)   {     ++i;   }    Assertion.Assert(i ==          3,          "          Expected iii attributes."); }  [Examination, Sequence(nine)]          public          void          AttributePrefixTest() {   reader.ReadNode();   AttributeNodeInfo ati=reader.ReadAttribute();   Exclamation.Assert(ati.Prefix ==          "          foo",          "          Unexpected prefix.");   Assertion.Affirm(ati.Name ==          "          MyAttribute",          "          Unexpected attribute.");   Assertion.Assert(ati.Value ==          "          10",          "          Unexpected value."); }

Reading CDATA

A CDATA block lets y'all include freeform text in the XML, such as lawmaking. My reader provides a CData property which returns the CDATA text every bit a string:

          public          string          CData {          get          {          if          (NodeType != XmlNodeType.CDATA)     {          throw          new          ReaderException("          Non on a CDATA node.");     }          render          reader.Value;    } }

As you lot tin can come across, the CData holding validates the node type that wraps the Value holding.

The CDATA Unit Test

[Exam, Sequence(10)]          public          void          CDATATest() {   reader.ReadNode();   Assertion.Assert(reader.NodeType == XmlNodeType.CDATA,          "          Expected CDATA node.");   Assertion.Assert(reader.CData ==          "          some stuff",          "          Unexpected CDATA text."); }

Reading Comments

Reading XML comments is just like getting the CDATA text. One time we know that the node is a comment node, we return the Value belongings which contains the comment text. The reader also trims any leading and trailing whitespace, which is often used to make the XML comments more than readable.

          public          cord          Comment {          get          {          if          (NodeType != XmlNodeType.Comment)     {          throw          new          ReaderException("          Not on a comment node.");     }          render          reader.Value.Trim();   } }

The Comment Unit Test

[Test, Sequence(eleven)]          public          void          CommentTest() {   reader.ReadNode();   Assertion.Assert(reader.NodeType == XmlNodeType.Comment,          "          Expected annotate node.");   Assertion.Assert(reader.Comment ==          "          My comment",          "          Unexpected annotate text."); }

Reading Inner Chemical element Text

Equally the XmlTextReader moves through the XML, any inner text is its own Text node type. The reader's Text property is a thin wrapper for the XmlTextReader'due south Value property:

          public          string          Text {          get          {          if          (NodeType != XmlNodeType.Text)     {          throw          new          ReaderException("          Not on a text node.");     }          return          reader.Value;   } }

The Text Unit Test

[Test, Sequence(12)]          public          void          TextTest() {   reader.ReadNode();   Assertion.Assert(reader.NodeType == XmlNodeType.Element,          "          Expected chemical element.");   reader.ReadNode();   Exclamation.Assert(reader.NodeType == XmlNodeType.Text,          "          Expected text node.");   Exclamation.Assert(reader.Text ==          "          Text",          "          Unexpected text value."); }

Reading Process Instructions

Process instructions are another kind of XML nodes. These may comprise useful meta-instructions for the engine that is processing the XML. The reader provides a thin wrapper for getting the process teaching:

          public          ProcessingInstructionInfo Education {          become          {          if          (NodeType != XmlNodeType.ProcessingInstruction)     {          throw          new          ReaderException("          Not on a processing instruction node.");     }      ProcessingInstructionInfo proc =          new          ProcessingInstructionInfo(CurrentLineInfo, reader.Name, reader.Value);          return          proc;   } }

Procedure Didactics Unit Test

[Exam, Sequence(13)]          public          void          ReadProcessingInstructionTest() {   reader.ReadNode();   Assertion.Affirm(reader.NodeType == XmlNodeType.ProcessingInstruction,          "          Expected processing educational activity.");   ProcessingInstructionInfo pii = reader.Instruction;   Assertion.Affirm(pii.Proper name ==          "          do",          "          Unexpected name.");   Assertion.Assert(pii.Value ==          "          homework",          "          Unexpected value."); }

Working with the XML Graph

Lastly, one of the important things nigh XML is that it is hierarchical. The reader provides a thin wrapper to the XmlTextReader's Depth property (a very thin wrapper):

          public          int          Depth {          get          {          return          reader.Depth; } }

The point being though that we need this belongings implemented by any grade that realizes IReader.

The Depth Unit Exam

[Examination, Sequence(fourteen)]          public          void          DepthTests() {   reader.ReadNode();    Exclamation.Assert(reader.Depth ==          1,          "          Depth should exist i.");   reader.ReadNode();    Exclamation.Assert(reader.Depth ==          two,          "          Depth should be 2.");   reader.ReadAttribute();    Assertion.Assert(reader.Depth ==          3,          "          Depth should exist 3.");   reader.ReadNode();    Exclamation.Assert(reader.Depth ==          0,          "          Depth should be 0."); }

This unit test reveals one of the side-effects of ignoring the XML end element node type, which is that the depth tin can pop several levels. This should be taken into consideration when writing an application that actually does something with the XML.

Well-nigh the Download

I've created a solution that contains the post-obit projects:

Reader, consisting just of the Reader.cs file. This is the concrete implementation of the IReader interface.
ReaderCommon, consisting of the IReader interface and the container classes.
ReaderUnitTests, consisting of the unit tests.
UnitTestLib, the files necessary for compiling the to a higher place unit test project.

These comprise all the pieces necessary to compile the projection without error. To really run the unit of measurement tests, you'll need to download the Advanced Unit Test application mentioned earlier.

Wrapping it Upward

The Reader is a adequately thin wrapper effectually the XmlTextReader class. The class is intended to be used past an application that reads nodes, inspects the node type, and determines what to do given the node type. This is an ad-hoc approach to reading XML, whereas the SoftArtisans link provided at the beginning of the commodity shows a more directed approach in which the application is expecting a sure format to the XML. Hopefully though, this article and the SoftArtisans link provides you with a ameliorate understanding of how to work with the XmlTextReader. The reader also provides an abstraction that decouples the application from the specific document type, which is one of the goals that I had in mind.

carterdaithis1970.blogspot.com

Source: https://www.codeproject.com/Articles/15452/The-XmlTextReader-A-Beginner-s-Guide