The Blueprint Training

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with Link extractor and Automator

I made an automation process. It's working but I have one random BUG. In the process, sometimes, the link extractor is blocked.
I make the link extraction of thousands of urls. After some hours it must be finished. The process is still running but there is still 1,2 or 3 "Connections" and it never finish. So I guess the program does not close the connexions and is waiting for the connexion to close to do the next step of the automator process. But the connexions are not closing.
I have to do a ctrl alt delete to kill the link extractor process.

First, I thought I had too much urls to extract from (millions) but it happens also when there is way less urls. I have the same problem in another scrapebox instance on another vps (windows server 2012 R2).

All my scrapebox are up to date.

Someone with the same problem ???

Please help me !!


Attached Files Thumbnail(s)
Found a topic with the same problem:
Its locked threads. Inside of you scrapebox you can try and hack it by going to settings, in the link extractor (not in the automator, just load the link extractor from the addons menu) and abort when less then X connections are active. Just set that to like half of whatever connections your using, or even 20% just as long as its more then what you normally see locked threads. So if you see 3 locked threads, set it to at least 5, but not equal to or more then your total connections.

Something outside of scrapebox is locking the threads so fixing that is your best bet as the above hack works sometimes and sometimes not depending on what is affecting scrapebox from the outside.
That means that something has locked 1 or more of the threads. This can be security software such as anti-virus, malware checkers and firewalls. So you should whitelist scrapebox in all security software and then you can whitelist the entire scrapebox folder as well.

Further any program that accesses the internet can lock threads, things like skype, utorrent etc… So you can try closing down any unneeded programs. Then if its working you can turn programs back on 1 by 1 to find the culprit.

Further computer optimization software can lock threads so you can shut any such software down.

Take note that disabling security software (such as anti-virus, malware checkers and firewalls) often only stops new rules form forming, but allows existing rules to still fire. So you have to fully whitelist in the security software or uninstall the security software(as a test).

Further some security softwar requires you to whitelist in more then one place before it takes effect.

Also note that disabling a router firewall, does actually fully disable it.

Basically you have to sort out what is locking the threads, because scrapebox is forced to wait until all threads are released. On occasion it can be your operating system that does it, so you can try restarting your machine and/or lowering total connections.

One other thing to note is that this can happen with proxies that keep returning small amounts of data, it won't trigger the timeout because teh connections is still active. So try a test using no proxies or make sure you are using some quality private proxies.

Lastly if your running mac, you can try lowering the connections. Mac has terrible error handling when it comes to lots of errors stacking up quickly. So if there are too many errors stacking up too quick mac can choke, so lowering the threads fixes this. This is a non issue on windows.
[-] The following 1 user says Thank You to loopline for this post:
  • Fred78
thanks for your detailled answer
your welcome mate, have a great day!

Users browsing this thread: 1 Guest(s)