06-13-2016, 12:10 PM
SHORT QUESTION
Is it possible to run the link extractor addon but set it to only extract the first 10 links on the page ?
LONG QUESTION - more detail + background
Ive got a list of sites i want to scrape to find out wether they have a blog, to do this im using the pages scanner addon to search for urls like :
/blog
/news
/posts
etc.. i total i have about 30 in this list
This works fine (but if anyone can think of a better way to do it would be happy to hear)
The issue im having is that some sites, especially in the area im searching have an splashpage / enter / intro page, the type where you just get a company logo and a link that says enter. Becuase of this im currently running a link extractor crawl (1 level deep with some excluded terms to try and narrow down the crawl, but this still returns many many more urls than i need)
Is there a way to set the link extractor to only crawl the first 10 urls it sees so if it does land on a full homepage it will only take the first 10 urls, rather than 100s
Is it possible to run the link extractor addon but set it to only extract the first 10 links on the page ?
LONG QUESTION - more detail + background
Ive got a list of sites i want to scrape to find out wether they have a blog, to do this im using the pages scanner addon to search for urls like :
/blog
/news
/posts
etc.. i total i have about 30 in this list
This works fine (but if anyone can think of a better way to do it would be happy to hear)
The issue im having is that some sites, especially in the area im searching have an splashpage / enter / intro page, the type where you just get a company logo and a link that says enter. Becuase of this im currently running a link extractor crawl (1 level deep with some excluded terms to try and narrow down the crawl, but this still returns many many more urls than i need)
Is there a way to set the link extractor to only crawl the first 10 urls it sees so if it does land on a full homepage it will only take the first 10 urls, rather than 100s