(Just wanted to check here since I didn't get much response on RailsForum lately.)
Has anyone encountered issues with Mechanize not picking up anchor tags using CSS selectors?
Here's a snippet of the HTML code:
<td class='calendarCell' align='left'>
<a href="http://www.mysite.org/index.php/site/ActivitiesCalendar/2010/02/10/">10</a>
<p style="margin-bottom:15px; line-height:14px; text-align:left;">
<span class="sidenavHeadType">
Current Events</span><br />
<b><a href="http://www.mysite.org/index.php/site/
Clubs/banks_and_the_fed" class="a2">Banks and the Fed</a></b>
<br />
10:30am- 11:45am
</p>
I'm trying to extract data from these event elements. Everything else works fine except for grabbing the anchor within the <p>
. There's an <a>
tag inside the <b>
, which I need to follow for more details about the event.
In my rake task, I have:
agent.page.search(".calendarCell,.calendarToday").each do |item|
day = item.at("a").text
item.search("p").each do |e|
anchor = e.at("a")
puts anchor
puts e.inner_html
end
end
Oddly enough, while item.at("a") returns the anchor, e.at("a") returns null. And when I view the inner_html of the <p>
element, it doesn't include the anchor at all. Here's a sample output:
nil
<span class="sidenavHeadType">
Photo Club</span><br><b>Indexing Slide Collections</b>
<br>
2:00pm- 3:00pm
However, when running the same scraping directly with Nokogiri:
doc.css(".calendarCell,.calendarToday").each do |item|
day = item.at_css("a").text
item.css("p").each do |e|
link = e.at_css("a")[:href]
puts e.inner_html
end
end
Nokogiri recognizes the anchor inside the <p>
and retrieves the href correctly.
<span class="sidenavHeadType">
Bridge Party</span><br><b><a href="http://www.mysite.org/index.php/site/Clubs/party_bridge_51209" class="a2">Party Bridge</a></b>
<br>
7:00pm- 9:00pm
Since Mechanize is supposed to utilize Nokogiri, I'm curious if others are experiencing the same issue or if it might be related to the version.
Any help would be appreciated. Thank you!