Parsing RSS 2 and Dublin Core
November 22nd, 2005A few days ago I started building a Control for ASP.NET to load and display an aggregate of RSS 2.0 blog entries. I have it working for the perfect scenario, but I have come across some annoying differences in the "standard" RSS 2.0 format. If you look at the sample on Wikepedia you will see a few XPath references will get you most of what you need. You can get all entries with "/rss/channel/item" and from there further XPaths will get you the title, link, description, etc...
To make the blog aggregator sort the items I planned to use the value in pubDate, with the most recent blog entries showing up first. It worked great until I started to try various RSS 2.0 feeds, such as my own which is the default format provided by Movable Type. Instead of pubDate, it uses "dc:date" which is apparently a specification developed as a part of the Dublin Core Metadata Initiative.
Now instead of just cleanly looking for a standard date element with a universally functional XPath, now I have to introduce a namespace manager. In .NET this is done with the XmlNamespaceManager. I have done this before, but it was a very confusing process. I did it in the early .NET 2.0 Beta 1 days when the XML classes were a bit different than the .NET 1.1 versions but the documentation for them was not quite all there. Ultimately I found you would just use the following code:
Dim xnsm As New XmlNamespaceManager(document.NameTable)
xnsm.AddNamespace("dc", "http://purl.org/dc/elements/1.1/")
Dim node As XmlNode = item.SelectSingleNode(nodeName, xnsm)
Now you can use an XPath reference with the namespace to get at your values. If you reference it relative to the "item" node, you just use "dc:date" while the full path would be "/rss/channel/item/dc:date", but I first get a XmlNodeList of the items and iterate through them with relative XPath references to get at the child nodes.
Later I hope to release the full code base which I have ported from a User Control (.ascx) to a Server Control which is much more trivial to add to an existing ASP.NET 2.0 website. It even adds itself automatically to the toolbox in design mode.
