08-29-2019, 05:09 PM
(08-28-2019, 11:48 PM)loopline Wrote:(08-25-2019, 09:39 AM)DigitalMu Wrote:(08-24-2019, 09:56 PM)loopline Wrote: 10 proxies I would set a 60 delay to start. Im not talking about 60 seconds between when an ip is used, Im talking like 10 mins between when each ip is used.
Are these EXTREMELY conservative settings? As you know, I'm learning....I started off too hardcore and had to dial it back considerably. I'm now shopping again for good proxies and how to best use them. At 10 min (or even 60 seconds), I'd be 130 years old before my first job was done, right? That seems crazy slow.
I have one set of 100 proxies which are good for everything except Google searches, so that sucks.... I have another set of 15 proxies which I'm tyring to figure out the limits on. I keep dialing them back more and more....but I'm doing kinda ok 1 second delay, 1 thread on the Detailed Harvester. Still, this is just deathly slow to me lol. I think I lack patience
At one point, 2-3 weeks ago, I was doing something like 700 urls/sec using the custom harvester. It was working great....but I can't remember what set of proxies I was using though and I can't seem to replicate that.
Obviously money isn't unlimited here so I don't see myself buying $200+ of proxies. I accidentally renewed the Extraproxies.com where I got my 100 google-banned proxies lol...they were cheap but mostly usefulness if I can't use them on Google.
So yeah, still shopping around.
But surely there's a significantly faster way to scrape?
It depends on the query. I mean I had someone like 2 years ago having to use a 300 delay between each proxy used to get it to work. So the more aggressive the query, the higher the delay needed.
It is slow, but its still faster then using no proxies. It used to be fast, but google just keeps tightening the belt on how long it can be. So private proxies can take big delays, but you start high, dial it back till they start getting blocked and you know your sweet spot.
Thats why this kind of proxies is popular because you can go faster with scraping. The draw back is they are less good for posting, so I use private proxies for posting and these for scraping.
I also work on developing footprints that give me what I want from bing, because bing is much more lenient.
(08-26-2019, 04:21 PM)googlealchemist Wrote:(08-24-2019, 09:56 PM)loopline Wrote: 10 proxies I would set a 60 delay to start. Im not talking about 60 seconds between when an ip is used, Im talking like 10 mins between when each ip is used.
Oh wow ok thanks for clarifying....so that would be 6min at those settings?
In the video I think you said it is a random selection of proxies vs sequential why is that? Seems that would eventually hit the same one/s more often and get them banned quicker randomly vs a set delay between them all?
I tried a 60 second delay, and by default it only does single threads with a delay right? I have it set to single thread in harvester regardless but am missing something..heres a screenshot of various settings https://imgur.com/a/tvIs3iS ...seems to be harvesting way too much way too fast right off the bat?
And anytime I try to increase the delay beyond 60seconds and update it, it just defaults back to 60?
So to do any heavy scraping in any decent amount of time I'm going to need a ton of good proxies, like hundreds of them? To plow thru a few hundred thousand keywords/footprints.
The custom harvester does not support a delay, only the detailed harvester.
There are 2 delays, the 1 in your screenshot is between each page. So get page 1 results, wait the delay, get page 2 results.
There is also a delay in detailed harvester it is a delay between each keyword. So get all results for a given keyword/query, delay then move to the next. I use this one, Ive never really much used the one in the between pages as it seemed to be just a bit more complex in doing the math and figuring it out and I like to keep it simple when possible.
Different functions in scrapebox work differently as far as randomly pulling proxies working in sequential order. I don't know all the reasons of why everything is done, but they do massively extensive testing to determine what is best in a given function and then they implement it. So its kind of it is what it is and how do we work with what it is, kind of thing.
yeah, I mean developing your bing strategy is good and then the proxies in the above video etc..
oh ok thanks, i see that option now and having it giong properly i think with a 70 second delay, just seeing if i can go on my own ip vs any proxies at this point at this rate w/o getting banned. will lower it till i find the minimum
And yeah I notice bing just lets you rip for a long time before burning out even without proxies