08-02-2019, 03:28 PM
So I'm becoming increasingly comfy with Scrapebox (thanks to this site and your youtube vids). I've set myself up on a pretty robust vps. I have a growing number of good proxies. I'm moving along and finding all kinds of awesome uses for the software....but
I've run into a situation there it keeps crashing on me.
I have a list of about 1 million keywords that I wanted to scrape for URLs. Clearly I can't just let it run until it finishes or I'll have a zillion gig url list. At first, I was scraping manually and turning it off at about 4million urls, then I'd de-dupe and do other url cleaning. All is well. I wanted to automate the process though. So I tried doing it in Automator and realized it wouldn't work because the software won't chunk it. Rather, it won't move to the next step until all 1 million keywords have been run through. That's cool...
I figured I'd use the option for it to auto-save about every 2 million urls...creating files of 2 million urls each. Easy enough.
However, when I hit just over 4 million total (two lists of 2 million urls), Scrapebox starts going nutty on me, grinds to a halt, and eventually crashes. I suspect it's keeping all of the urls in some kind of memory and cannot make room for more. I can't find an option to adjust this though. It'd be nice to sort of reset itself or something every time it saves a list of 2 million.
Any idea what's happening and how to fix it? Maybe there's another way to harvest urls from such a large list of keywords?
I'm running about 800 threads and using the custom harvester to do it. I'm scraping only from about 5 or 6 search engines. Am I hitting my machine limitations maybe? I'm paid up for this month but am going to be moving to a bigger, better dedicated server next month.
Thanks!!
I've run into a situation there it keeps crashing on me.
I have a list of about 1 million keywords that I wanted to scrape for URLs. Clearly I can't just let it run until it finishes or I'll have a zillion gig url list. At first, I was scraping manually and turning it off at about 4million urls, then I'd de-dupe and do other url cleaning. All is well. I wanted to automate the process though. So I tried doing it in Automator and realized it wouldn't work because the software won't chunk it. Rather, it won't move to the next step until all 1 million keywords have been run through. That's cool...
I figured I'd use the option for it to auto-save about every 2 million urls...creating files of 2 million urls each. Easy enough.
However, when I hit just over 4 million total (two lists of 2 million urls), Scrapebox starts going nutty on me, grinds to a halt, and eventually crashes. I suspect it's keeping all of the urls in some kind of memory and cannot make room for more. I can't find an option to adjust this though. It'd be nice to sort of reset itself or something every time it saves a list of 2 million.
Any idea what's happening and how to fix it? Maybe there's another way to harvest urls from such a large list of keywords?
I'm running about 800 threads and using the custom harvester to do it. I'm scraping only from about 5 or 6 search engines. Am I hitting my machine limitations maybe? I'm paid up for this month but am going to be moving to a bigger, better dedicated server next month.
Thanks!!