Retrieve the titles and URLs of Yahoo search results using C# programming

Is there a way to extract titles and URLs from Yahoo search result page using the htmlagility pack?

HtmlWeb web = new HtmlWeb();
string queryText = "your_search_query_here";
string searchResults = "https://en-maktoob.search.yahoo.com/search?p=" + queryText;
var document = web.Load(searchResults);
var nodes = document.DocumentNode.SelectNodes("//a[@cite and @href]");
if (nodes != null)
{
    foreach (var node in nodes)
    {
        string title = node.Attributes["title"]?.Value;
        string url = node.Attributes["href"]?.Value;

    }
}

This code successfully retrieves titles and URLs from Yahoo search results, however, it includes ads links and other unwanted URLs. How can we filter out these irrelevant links to access only the correct ones?

Answer №1

How about this:

HtmlWeb w = new HtmlWeb();

string search = "https://en-maktoob.search.yahoo.com/search?q=squirrels";
//ac-algo ac-21th lh-15
var hd = w.Load(search);

var titles = hd.DocumentNode.CssSelect(".title a").Select(n => n.InnerText);
var links = hd.DocumentNode.CssSelect(".fz-15px.fw-m.fc-12th.wr-bw.lh-15").Select(n => n.InnerText);

for (int i = 0; i < titles.Count() - 1; i++)
{
    var title = titles.ElementAt(i);
    string link = string.Empty;
    if (links.Count() > i)
        link = links.ElementAt(i);

    Console.WriteLine("Title: {0}, Link: {1}", title, link);
}

Remember to include the extension method CssSelect, from the nuget package ScrapySharp. Install it similar to how you installed HtmlAgilityPack, then insert a using statement at the beginning of the code such as using ScrapySharp.Extensions; and you're all set. (I prefer it because it's more convenient to use css selectors instead of xpath expressions...)

When it comes to bypassing ads, I've observed that ads in these yahoo search results typically appear only at the last listing? Assuming my observation is accurate, simply exclude the final result.

Here's what the output looks like when running the aforementioned code:

https://i.sstatic.net/RGSje.jpg

Similar questions

If you have not found the answer to your question or you are interested in this topic, then look at other similar questions below or use the search

Pause jQuery at the conclusion and head back to the beginning

As a beginner in the world of jQuery, I am eager to create a slider for my website. My goal is to design a slideshow that can loop infinitely with simplicity. Should I manually count the number of <li> elements or images, calculate their width, and ...

Executing jQuery post request on a div loaded via ajax

I'm facing a challenge with my webpage. I have a section where the content of a div is loaded via ajax. This div contains forms, and after submission, it should update to display the new content without refreshing the entire page. Can anyone guide me ...

Cannot close the Kendo UI Scheduler Edit pop-up window after editing

I decided to give the Kendo UI HTML Scheduler a try and I have made some progress. I managed to retrieve appointments from the database using my ASP.NET MVC app. When it comes to reading, I am sending a JsonResult from my ASP.NET controller. For updatin ...

Enhance Query with the Inclusion of Two Tables in ASP.NET VB

Seeking assistance with an update query that is essential despite its redundancy. When a user creates an account, which initially only includes the ID, Username, Password, and CustomerID, a record in Customers must be created with null values, and the Use ...

CSS Duo-Toned Background

I've been searching everywhere for a solution to this issue. I have a design that I'm attempting to code using divs and CSS. The top half of the image features a gradient that transitions from left to right with different colors. My struggle lies ...

Execute a JavaScript code to run a Selenium Webdriver script

Is it possible to execute a selenium webdriver script, written in Eclipse, by using JavaScript code to click on an HTML button? I am looking for any ideas or suggestions on how to run the Selenium script. Thanks! ...

How can I convert a button to a link within an HTML table?

I have a question regarding buttons in my HTML table. Is it possible that upon clicking a button, the link button changes to "Un-Extend," and then switching back to the previous button when clicked again? How can I achieve this functionality? https://i.sst ...

Python: Automate clicking on checkboxes with Selenium

Recently, I've encountered a problem and I'm in search of an answer. Below is some HTML code that I need to work with. My goal is to send a snippet of this HTML to selenium so that it can click on a checkbox. However, I don't want to use the ...

Is there a notable distinction when utilizing http for referencing assets?

As I work on creating a new website, the question of displaying images has come up. I plan on using an img tag, but I'm unsure if it matters how I structure the source link: <img src="img/mypic.png"> or like this: <img src="http://www.mysi ...

Accessing values from dynamic controls in asp.net within a FormView connected to a database entity field

In my aspx file, I have the code snippet below: <asp.DynamicControl ID="ArticleName" runat="server" DataField="Name" Mode="Edit"/> How can I retrieve the value of the DataField property in the DynamicControl from the code behind? For example, if th ...

Recovering the initial error code from a COM function invoked using reflection

In my project, I have a VB6 COM component that I need to call from my .Net method. Instead of using tlbimp to create a library against the COM DLL, I opt to use reflection to dynamically create and activate an instance of the COM object like this: f_oType ...

Using HTML formatting in the Visual Studio 2010 editor

When I reformat HTML source code in Visual Studio using Ctrl-K, Ctrl-D, it rearranges my code like this: <p> text</p> <p> more text</p> Is there a way to make it format the code like this instead? <p> text </ ...

What is the method for obtaining the ID of a dynamically generated DropDownList and verifying if its value has been altered?

I successfully created a view with the help of Steven Sanderson's blog where a dynamic number of textboxes and DropDownList are generated. Everything is functioning properly. However, I would like to ensure that the form cannot be submitted until each ...

Unique style sheet for unique content block

Recently I made some custom modifications to a block by adding style attributes directly into the .phtml file between tags. Now, I am looking to create a separate file for my block so that it can use a custom .css file. Where should I place this new file? ...

What is the best way to make changes to the DOM when the state undergoes a

I've programmed the box container to adjust dynamically based on input changes. For instance, if I entered 1, it will generate one box. However, if I modify the input to 2, it mistakenly creates 3 boxes instead of just 2. import React from 'rea ...

CSS image cropping is a technique used to adjust the size

I have successfully created a grid of images with two columns, but I'm facing an issue with a portrait image disrupting the layout. I am looking for a way to adjust or "crop" the image so it matches the height of the landscape ones. I attempted to use ...

one container stacked on top of another

I have been creating web pages using tables for a long time. Recently, I decided to make the switch to using divs like everyone else, but it has been quite challenging. Can anyone help me solve this issue? I have included an image to better illustrate the ...

Harnessing the power of Div tags with a href attribute as placeholders for interactive menu items

I am facing an issue with multiple divs covering different portions of a menu on a background image of a webpage. Each div contains an anchor tag linking to another page and playing a click sound upon clicking. To ensure the cursor changes across the ent ...

How can I create a dynamic form with AJAX, PHP, JavaScript, and HTML?

If you come across a situation where you have the following files: <head></head> <body> <div id="my_personal_div"> </div> </body> And in your Javascript file: $.(document).ready(){ $.ajax({ url:/any ...

Incorporating dynamic content changes for enhanced user experience can be achieved more effectively with AngularJS

I am utilizing a REST service to retrieve information for a banner. The goal is to dynamically update the slider on the main page using this data. Currently, I am using $http.get to fetch the data, and then implementing interpolation with additional ng-if ...