Although I have experience with R, I am new to HTML and CSS. I have been researching various web scraping methods both online and on Stack Overflow in order to implement them using R. However, I am encountering difficulties when it comes to extracting company ratings from job listing pages. Instead of retrieving the expected rating of 4.0 from the example URL, I keep getting character(0)
.
Below is my approach:
library(rvest)
library(tidyverse)
library(xml2)
#example URL
url<- "https://www.indeed.com/viewjob?jk=a25a91736b1f7042&tk=1e3q54n49heai800&from=serp&vjs=3&advn=8876452989351355&adid=95236293&sjdu=TDSJNe66qIM3gcXFOG94m--bPylNW2vvO3WAHEKN7JhCAD1FQ-2FXD1gQyElsLNkg6gfXO2CD3rQYOYjO9iXITyFdYOp8tCECkHuDmf3Og8qdMmciGFIv2ahigETjLmuY8uXdLjnQTg4__yOXqHJkA"
page<- read_html(url)
page%>
rvest::html_nodes("span") %>%
rvest::html_nodes(xpath = '//*[contains(concat( " ", @class, " " ), concat( " ", "ratingsContent", " " ))]')%>%
rvest::html_text()
#Output is
#character(0)
#It should return 4.0 instead!
Can anyone provide guidance on how to achieve this, and also suggest a method for returning NA
if the company rating is missing? Thank you!