When crawling a website,
I feel that h2 and h3 have the same structure. Why can h2:first-child get data, but h3 cannot.
The final results h2_1 and h2_2 are the same, no problem.
h3_1 is ok, but h3_2 is empty. Why is this?
code show as below,
const jsdom = require('jsdom');
const jquery = require('jquery');
jsdom.env('https://www.osram.com/os/news-and-events/spotlights/index.jsp', [], {
defaultEncoding: 'utf-8'
}, function(err, window) {
if(err) {
console.error('error get news url from page [%s]');
return;
}
let $ = jquery(window);
let el = $('p.col-xs-6.col-sm-7.colalign:first');
let h2_1 = $(el).find('h2.font-headline-teaser').text();
console.log('h2_1=' + h2_1);
let h2_2 = $(el).find('h2.font-headline-teaser:first-child').text();
console.log('h2_2=' + h2_2);
let h3_1 = $(el).find('h3.font-sub-headline').text();
console.log('h3_1=' + h3_1);
let h3_2 = $(el).find('h3.font-sub-headline:first-child').text();
console.log('h3_2=' + h3_2);
window.close();
});
The selector xxx:first-child means that when the first child element of the parent element of xxx is xxx, to select xxx, these two conditions need to be met at the same time.
is not the first child element of the parent element of xxx, nor is it the first xxx among the child elements of the parent element of xxx
The first child element of the parent element ofh2.font-headline-teaser is h2.font-headline-teaser, so it can be selected
The first child element of the parent element of h3.font-sub-headline is not h3.font-sub-headline, so it is empty