05-16-2018, 02:04 PM
Hi
I recently purchased the expired domain finder plugin and ran it on a couple of sites - I set it to crawl 6 levels deep - added 100 private proxies and ran it with about 7 threads - I also disable all the metrics so it only shows the expired domains.
It ran all night on my VPS but the seed list was no good and it only returned a few results so I decided to stop it and test it on a larger site.
I then got a better seed list (2 urls) I loaded them into the expired domain finder - I then set the crawl depth to 25 (max) and I set threads to 25 (using same 100 proxies) All metrics were disbaled (google, alexa, plus 1 etc) I lets it run on my VPS - I then logged back into the VPS about 3 hours later and the expired domain finder tool was gone (it had closed) scrapebox however was still open (I also had 1 more instance of scrapebox running that was scraping with the harvester - but the harvester uses different reverse proxies to what I was using in the expired domain finder)
So I started the tool again - opened the saved location and I saw the files ( from the previous scrapes ) - I didnt know what was wrong - I added https://github.com/ - https://www.theguardian.com as the seed urls with a depth of 25 - I thought it might be the number of threads
so I started the tool again - this time with 12 threads - added the same seed urls and let it run - today I logged into my VPS and the expired domain finder is not there again - It again closed - So I looked for the logs and found some error txt files inside of ScraoeBoxApplicationFolder > Plugins > Expired Domain Finder
In this folder I looked at the bugreport.txt file - Inside of it I noticed it had a NO near use proxies for crawler - I quickly realized that i had to enable the proxies for the crawler after I loaded them - So I have now done this.
I then looked at the errors crawler.txt file and saw a number of errors that refered to making too many requests (I guess this is due to not using proxies and having multiple threads)
Ive now started the expired domain finder up again - and this time I have enabled proxies for the crawler. But when i open the bugreport.txt file (it allows me to open this but does not allow me to open crawler error file because its being used) then it still shows use proxies for crawler NO - and the userinterface doesnt really tell me if proxies are actually being used (just shows me the number of proxies ive imported in the top)
My main question is this. after the expired domain finder has completed then does it automatically close? or should it remain open? if it should remain open then im guessing its not completing the scrapes - is there any optimum settings someone could reccoemend? (number of threqads- depth - proxies etc) Im happy to keep this running for days on the VPS (just like I do with the harvester)
If I need to send anything to support (such as files etc) then please let me know - I have however started a new scrape so im guessing the previous scrqape errors etc will no longer be there
thanks
I recently purchased the expired domain finder plugin and ran it on a couple of sites - I set it to crawl 6 levels deep - added 100 private proxies and ran it with about 7 threads - I also disable all the metrics so it only shows the expired domains.
It ran all night on my VPS but the seed list was no good and it only returned a few results so I decided to stop it and test it on a larger site.
I then got a better seed list (2 urls) I loaded them into the expired domain finder - I then set the crawl depth to 25 (max) and I set threads to 25 (using same 100 proxies) All metrics were disbaled (google, alexa, plus 1 etc) I lets it run on my VPS - I then logged back into the VPS about 3 hours later and the expired domain finder tool was gone (it had closed) scrapebox however was still open (I also had 1 more instance of scrapebox running that was scraping with the harvester - but the harvester uses different reverse proxies to what I was using in the expired domain finder)
So I started the tool again - opened the saved location and I saw the files ( from the previous scrapes ) - I didnt know what was wrong - I added https://github.com/ - https://www.theguardian.com as the seed urls with a depth of 25 - I thought it might be the number of threads
so I started the tool again - this time with 12 threads - added the same seed urls and let it run - today I logged into my VPS and the expired domain finder is not there again - It again closed - So I looked for the logs and found some error txt files inside of ScraoeBoxApplicationFolder > Plugins > Expired Domain Finder
In this folder I looked at the bugreport.txt file - Inside of it I noticed it had a NO near use proxies for crawler - I quickly realized that i had to enable the proxies for the crawler after I loaded them - So I have now done this.
I then looked at the errors crawler.txt file and saw a number of errors that refered to making too many requests (I guess this is due to not using proxies and having multiple threads)
Ive now started the expired domain finder up again - and this time I have enabled proxies for the crawler. But when i open the bugreport.txt file (it allows me to open this but does not allow me to open crawler error file because its being used) then it still shows use proxies for crawler NO - and the userinterface doesnt really tell me if proxies are actually being used (just shows me the number of proxies ive imported in the top)
My main question is this. after the expired domain finder has completed then does it automatically close? or should it remain open? if it should remain open then im guessing its not completing the scrapes - is there any optimum settings someone could reccoemend? (number of threqads- depth - proxies etc) Im happy to keep this running for days on the VPS (just like I do with the harvester)
If I need to send anything to support (such as files etc) then please let me know - I have however started a new scrape so im guessing the previous scrqape errors etc will no longer be there
thanks