ScrapeBox Forum
to scrape archive.org - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: to scrape archive.org (/Thread-to-scrape-archive-org)

Pages: 1 2


to scrape archive.org - femmedebretagne - 07-18-2018

hello
how to use scrapebox to scrape a site on archive.org

thank you


RE: to scrape archive.org - femmedebretagne - 07-20-2018

hello friends
i need your help ineed any footprint to do that


RE: to scrape archive.org - loopline - 07-21-2018

I mean what are you wanting to scrape? the expired domain finder has an archive.org downloader but not a scraper

Can you be expressly more specific about what you want to scrape. Examples would be helpful.


RE: to scrape archive.org - femmedebretagne - 07-23-2018

hello i wante to scrape a content site

thank you


RE: to scrape archive.org - femmedebretagne - 07-23-2018

thank you i will buy the expired domain finder


RE: to scrape archive.org - femmedebretagne - 07-24-2018

hello
with this plugin can -I download all articles from a site on archive.org


RE: to scrape archive.org - loopline - 07-25-2018

No, it does not download articles. nothing in scrapebox will download articles from archive.org

If you buy an expired domain the archive.org downloader in the expired domain finder will download the entire site you can upload it again and then use that site, but not just articles.


RE: to scrape archive.org - wolfatadfilm - 08-28-2018

And while we're at it: I've used the archive.org grabber to download a site yesterday. But it didn't download everything that's available in web.archive.org for that domain.
I'm trying to get the rest, too. So I thought I'd rename the download folder and let it work again, using a different date in the snapshot date fields.
But now it's only downloading the home page, not any other pages. What can I do to steer it to a certain missing page?


RE: to scrape archive.org - loopline - 08-28-2018

There isn't really anything you can do to steer it to a specific page. you could drop a line to scrapebox support and give the specific link to the page thats missing along with an other helpful specific data and perhaps screenshots and see if its something they can compensate for or not.

Ive run into a couple cases where it just doesn't come out perfect. There are just too many possibilities out there and things people can do wrong to a site that cause the downloader, on occasion, to not be able to download some part of the site.


RE: to scrape archive.org - wolfatadfilm - 08-28-2018

Thanks for your answer, @loopline, it's not such a tragic, I will download those few pages manually and reintegrate them. It's helpful to know it's not always me ;-)