Peter Vandenabeele
Workaround for an REXML bug in 1.8.6 (does not find non-namespace qualified attribute names)
I was fighting for some time with the problem that a certain type of XPath query did not work correctly in REXML Ruby 1.8.6. Specifically, this type of query
top_pattern = "//*[@id=\"content\"]"
REXML::XPath.each(xml_body,top_pattern) do |xml_entry|
...
end
did not yield any results if top_pattern selects based on attribute names. It did work when top_pattern only contained descriptions of the style:
top_pattern = "/html/body/table/tbody/tr[2]/td"
Turns out that "tidy" with the option
tidy.options.output_xhtml = true
always generates an appropriate default xml namespace for xhtml, and that REXML has a known bug (in Ruby 1.8.6) that is described here:
http://www.intertwingly.net/blog/2007/11/02/MonkeyPatch-for-Ruby-1-8-6
The full solution is in the comments (also resolving the problem of phantom namespaces) by Alexander Pogrebnyak. As I am staying with ruby 1.8.5. for now (production system om Debian stable), I had to apply this patch to my own code, rather than look at Ruby 1.9 (I didn't check if the issue is resolved in Ruby 1.9).
Update: Upon further reading, it might be that the behavior of not matching non-namespace qualified attribute names is actually not a bug, but correct implementation of XPath. In that scenario, the correct result might be to explicitely map the xhtml namespace and add that to my XPath expressions. This article seems insightfull.