ScrapeBox Forum
Sorting footprints by engine? - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: Scrapebox Footprints (https://www.scrapeboxforum.com/Forum-scrapebox-footprints)
+--- Thread: Sorting footprints by engine? (/Thread-sorting-footprints-by-engine)



Sorting footprints by engine? - Neavon - 07-29-2014

Hello!

I would like to know that if we have succesfully scrapped our list/removed duplicates/sorted by PR etc. then how to also sort them that we will be sure that GSA SER is able certainly submit article/write a comment/leave a trackback etc on some of our scrapped URL's?

For an example, we have a footprint:
"Powered by ArticleMS" "Submit Article" "Main Menu" "Latest Articles" health

Then we got some results like that:
http://suwesa.de/groups/healthy-foods-to-lose-weight-powered-by-articlems-submit-article-main-menu-latest-articles-i-are-blessed-with-an-anxiety-disorder/

As we can see, we can't actually submit article to this site. Someone have posted our part of footprint "submit article" in the comment, thereby making this scrapped URL completly useless for us.

Any ideas how to be 100 percent sure that GSA will be able to submit all comments/trackbacks/forum posts etc to our scrapped LIST? It would be really annoying to see that only 5K from our 30k targeted list was not fake engine.


RE: Sorting footprints by engine? - loopline - 07-30-2014

There is no way to make 100% sure except for to scrape and post and see what sticks. You could hone your footprint a bit perhaps, but else thats what mass is about. Besides that, if the site changed since googles last scrape, or SER can't submit to it for some reason or it goes to moderation etc... There is just no way to know 100%. The best way is to look at the successful sites and build the footprint from there, and then you can look at failed sites and their url structure and see if you can build a url structure element that you can use scrapebox to "remove urls containing" and then enter that in and strip some off.

Else you will have to always slog thru some failures.


RE: Sorting footprints by engine? - Neavon - 07-30-2014

(07-30-2014, 01:25 AM)loopline Wrote: There is no way to make 100% sure except for to scrape and post and see what sticks. You could hone your footprint a bit perhaps, but else thats what mass is about. Besides that, if the site changed since googles last scrape, or SER can't submit to it for some reason or it goes to moderation etc... There is just no way to know 100%. The best way is to look at the successful sites and build the footprint from there, and then you can look at failed sites and their url structure and see if you can build a url structure element that you can use scrapebox to "remove urls containing" and then enter that in and strip some off.

Else you will have to always slog thru some failures.

I've got the point. Just wanted to get clearly about that. Thank you very much Smile.

Maybe you will also know for the answer for that question: It is better to verify our URL's before the blast, through GSA identify platform tool? We could have a list of 30k URL's and end up with 10k, because GSA don't recognize some and won't post to them. Am I right?


RE: Sorting footprints by engine? - loopline - 07-30-2014

Im not sure its "better" to verify them first, but that would make for a "Cleaner" posting in SER.

I mean SER will have to load each of the 30K urls, and then match it to a known good footprint, then give you back the 10K good. Then when you go to post to the 10K, it has to load them again, match against the same set of footprints and then post. So you loaded the 30K urls once and 10K urls again thus resulting in 40K url loads.

Vs just loading them in as targets, then the 30K all get loaded, the good get posted to, the bad get discarded, so 30K total url loads. This would be quicker and simpler.

Unless you plan on loading the end resulting list to like 10 projects and you are not using any of the global lists to accomplish this. So then you are loading 30K urls to 10 projects so 300K url loads. In this case its better to filter it down to 10K urls first as you would have to only load 130K total url loads (30 from the filter process and 10K x 10 projects).


RE: Sorting footprints by engine? - Neavon - 07-31-2014

(07-30-2014, 11:58 PM)loopline Wrote: Im not sure its "better" to verify them first, but that would make for a "Cleaner" posting in SER.

I mean SER will have to load each of the 30K urls, and then match it to a known good footprint, then give you back the 10K good. Then when you go to post to the 10K, it has to load them again, match against the same set of footprints and then post. So you loaded the 30K urls once and 10K urls again thus resulting in 40K url loads.

Vs just loading them in as targets, then the 30K all get loaded, the good get posted to, the bad get discarded, so 30K total url loads. This would be quicker and simpler.

Unless you plan on loading the end resulting list to like 10 projects and you are not using any of the global lists to accomplish this. So then you are loading 30K urls to 10 projects so 300K url loads. In this case its better to filter it down to 10K urls first as you would have to only load 130K total url loads (30 from the filter process and 10K x 10 projects).

I read it few times for better understanding Smile. Alright. Once again, thank you for your reply.


RE: Sorting footprints by engine? - loopline - 08-01-2014

Your welcome. Smile