ScrapeBox Forum
Understanding Connection To Proxy Ratio - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: Understanding Connection To Proxy Ratio (/Thread-understanding-connection-to-proxy-ratio)



Understanding Connection To Proxy Ratio - jim - 02-06-2021

I am watching looplines video on safely scraping google in 2020 and am having a fundamental misunderstanding on the terminology.

It says that there should be one connection for every 5 proxies, or that the connection ratio can vary.

How can more than 1 IP address make a single connection? When a page loads does it not load from a single IP?

How would it be possible for 50 IP addresses to load one connection?

What does "connection" mean in this case? What is a thread?


RE: Understanding Connection To Proxy Ratio - loopline - 02-09-2021

Connection and thread are the same thing.

what it means is 1 connection for 5 proxies, its going to use proxy 1 to scrape google for keyword 1

Then on keyword 2 it will use proxy 2 to scrape. Then on keyword 3 proxy 3, and so on till we get to keyword 6 and it goes back to proxy 1 and keeps going.

The concept is just time. Going too fast on google is the problem, too many connections in too short of a time period. So slow it down by using 1 connection on 50 proxies means that its going to work thru 50 keywords and 50 different ips before it cycles back to keyword 51 and uses proxy 1 again. Hence you get a sort of delay.

1 connnection to 5 proxies i no longer good. Its more like 1 connection to 50 or 1 connection to 100 proxies. Or more. If you don't have that many proxies just use the detailed harvester and add a delay. Start super high, then dial it back until you find the sweet spot where you start to get blocked, then you know how fast you can go. Blocks go away in 12 to 48 hours.