Xpath RDF nonsense

growse · 19 Feb 2008 at 16:55

I'm trying to parse a simple RDF feed in C# .NET using XPath. Here's what I've got:

Code:

                XPathDocument feedxml = GetXml()
                XPathNavigator nav;
                XPathNavigator subnav;
                XPathNodeIterator nodeiterator;
                nav = feedxml.CreateNavigator();

                XmlNamespaceManager xnm = new XmlNamespaceManager(nav.NameTable);
                xnm.AddNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
                XPathExpression xpe = nav.Compile("/rdf:RDF/item");
                xpe.SetContext(xnm);
                nodeiterator = nav.Select(xpe);

My nodeiterator is coming back with a count of 0, when I know there's items in the feed there. Similarly, if I try and do a SelectSingleNode to, say, "/rdf:RDF/link", I get null even though I know it exists.

What am I doing wrong?

chesterstu · 19 Feb 2008 at 16:57

have you checked the capitalisation as the XPath nodeiterator is case sensitive and highly annoying

growse · 19 Feb 2008 at 17:10

Take the slashdot RDF:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">

<channel rdf:about="http://slashdot.org/">
<title>Slashdot</title>
<link>http://slashdot.org/</link>
*snip*
</channel>

<title>Slashdot</title>

<item rdf:about="http://interviews.slashdot.org/article.pl?sid=08/02/19/1419207&amp;from=rss">
<title>Hi, I Want to Meet (17.6% of) You!</title>
<link>http://rss.slashdot.org/~r/Slashdot/slashdot/~3/237645650/article.pl</link>
<description>Frequent Slashdot contributor Bennett Haselton wants to make online dating better. Here's how he wants to do it. "Suppose you're an entrepreneur who wants to break into the online personals business, but you face impossible odds because everybody wants to go where everybody else already is (basically, either Match.com or Yahoo Personals). Here is a suggestion that would give you an edge. In a nutshell: Each member lists the criteria for people that they are looking for. Then when people contact them, they choose whether or not to respond. After the system has been keeping track of who contacts you and who you respond to, the site lists your profile in other people's search results along with your criteria-specific response rate: "Lisa has responded to 56% of people who contacted her who meet her criteria." Read on for the rest of his thoughts.&lt;p&gt;&lt;a href="http://interviews.slashdot.org/article.pl?sid=08/02/19/1419207&amp;from=rss"&gt;Read more of this story&lt;/a&gt; at Slashdot.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://rss.slashdot.org/~a/Slashdot/slashdot?a=5bcfGW"&gt;&lt;img src="http://rss.slashdot.org/~a/Slashdot/slashdot?i=5bcfGW" border="0"&gt;&lt;/img&gt;&lt;/a&gt;&lt;/p&gt;&lt;img src="http://rss.slashdot.org/~r/Slashdot/slashdot/~4/237645650" height="1" width="1"/&gt;</description>
<dc:creator>samzenpus</dc:creator>
<dc:date>2008-02-19T16:08:00+00:00</dc:date>
<dc:subject>internet</dc:subject>
<slash:department>there-is-someone-for-everyone-except-you</slash:department>
<slash:section>interviews</slash:section>

<slash:hit_parade>0,0,0,0,0,0,0</slash:hit_parade>
<feedburner:origLink>http://interviews.slashdot.org/article.pl?sid=08/02/19/1419207&amp;from=rss</feedburner:origLink>
</item>

So in the above case, I should be able to select /rdf:RDF/channel/title and get "Slashdot". Equally, because the <item> tag repeats a few more times, I should be able to iterate over it. I don't think capitalization is an issue, but I'm not overly experienced in this sort of thing. There's nothing obvious that is jumping out at me

growse · 19 Feb 2008 at 17:23

This is very strange. The expression "/rdf:RDF//rss:item" works, but I have no idea why item requires the rss bit in front of it, or why a double slash is needed.

Sic · 19 Feb 2008 at 20:28

I had trouble with this. I believe it's because item is part of the rss namespace, so it needs to be referred to in that way. not sure why it needs the double slash, it should be fine without it!

robmiller · 20 Feb 2008 at 02:38

Sic said:
not sure why it needs the double slash, it should be fine without it!

A double slash in XPath finds all descendents, not just children.

Edit: although if you mean why should it be needed in this case then I have no idea and agree with you

Sic · 20 Feb 2008 at 06:57

robmiller said:
A double slash in XPath finds all descendents, not just children.

Edit: although if you mean why should it be needed in this case then I have no idea and agree with you

yeah, I meant in this case because rss:item is a child and shouldn't need the //