11-02-2016, 04:02 AM
Thanks Loopline! Here's what I'm attempting to do:
I have 5,000 urls in an Excel document. These are potential prospects I received from a supplier based on predefined criteria we selected together. This criteria was selected based on modeling my existing customer database characteristics to see if we could find similar businesses that might qualify as leads/prospects. Most of these companies (urls) won't fit my business, however some will. I work for a catalog printer and I'm looking for companies who use catalogs in their marketing mix and need our printing / mailing services. My goal is to go through the list of URL's I have and find ones that contain the keyword "catalog" or "catalog request" or "request a catalog" in hopes that they will be potential prospects. Just using page scanner to search for these terms limits what I really need to do which is search an entire domain, not just an individual page.
I started off manually entering each URL into google like this ...
site:http://www.abc.com catalog
This approach won't return any results if the keyword "catalog" isn't found on the entire site, which is great. This tells me they aren't a good fit for our business. However, if results are returned from Google, I want to add these companies to my sales databases as prospects to call on to learn more about their print quanities, sizes, etc.. to see if they are a good fit for our business. The problem with this process is that I'd have to repeat it 4,999 more times, it obviously too slow. Especially, since I'll likely be doing this exercise monthly with about 5,000 each month, sometimes more. if I can find a way to automate/streamline the process that will be awesome. Also, right now in the manual process I'm only searching for one keyword because of time-constraints, but there are several other keywords/phrases I should be searching, such as: catalogue, brochure, look-book, "request catalog", etc ...
Using Link extractor seems like it would make sense based on your description. This would give me all the internal links from each url which means I'm not just searching that page, but the entire domain/site. Once this processing is completed I'd use page scanner to find the keywords in all these url's to find those that have my keywords. Basically a two step process, correct? Since I suspect page scanner may find the keyword on many urls from the same domain/site I'm guessing I'll be able to merge duplicates somehow, correct? Any guidance you can provide would be helpful.
My end-goal is to narrow down the 5,000 urls into a smaller group (probably 500 or less) that I know print catalogs. During my manual processing of URL's, described earlier, I found about 10 in the first 100 url's actually had the keyword "catalog". These 10 I identified will go into my prospecting database for phone call follow up by my lead generation team.
Hopefully this outline gives you a better idea of what I'm trying to accomplish. Any advice is certainly welcome. Looking forward to learning the power of this tool. Newbee here, but eager to learn. Thanks!
I have 5,000 urls in an Excel document. These are potential prospects I received from a supplier based on predefined criteria we selected together. This criteria was selected based on modeling my existing customer database characteristics to see if we could find similar businesses that might qualify as leads/prospects. Most of these companies (urls) won't fit my business, however some will. I work for a catalog printer and I'm looking for companies who use catalogs in their marketing mix and need our printing / mailing services. My goal is to go through the list of URL's I have and find ones that contain the keyword "catalog" or "catalog request" or "request a catalog" in hopes that they will be potential prospects. Just using page scanner to search for these terms limits what I really need to do which is search an entire domain, not just an individual page.
I started off manually entering each URL into google like this ...
site:http://www.abc.com catalog
This approach won't return any results if the keyword "catalog" isn't found on the entire site, which is great. This tells me they aren't a good fit for our business. However, if results are returned from Google, I want to add these companies to my sales databases as prospects to call on to learn more about their print quanities, sizes, etc.. to see if they are a good fit for our business. The problem with this process is that I'd have to repeat it 4,999 more times, it obviously too slow. Especially, since I'll likely be doing this exercise monthly with about 5,000 each month, sometimes more. if I can find a way to automate/streamline the process that will be awesome. Also, right now in the manual process I'm only searching for one keyword because of time-constraints, but there are several other keywords/phrases I should be searching, such as: catalogue, brochure, look-book, "request catalog", etc ...
Using Link extractor seems like it would make sense based on your description. This would give me all the internal links from each url which means I'm not just searching that page, but the entire domain/site. Once this processing is completed I'd use page scanner to find the keywords in all these url's to find those that have my keywords. Basically a two step process, correct? Since I suspect page scanner may find the keyword on many urls from the same domain/site I'm guessing I'll be able to merge duplicates somehow, correct? Any guidance you can provide would be helpful.
My end-goal is to narrow down the 5,000 urls into a smaller group (probably 500 or less) that I know print catalogs. During my manual processing of URL's, described earlier, I found about 10 in the first 100 url's actually had the keyword "catalog". These 10 I identified will go into my prospecting database for phone call follow up by my lead generation team.
Hopefully this outline gives you a better idea of what I'm trying to accomplish. Any advice is certainly welcome. Looking forward to learning the power of this tool. Newbee here, but eager to learn. Thanks!
(11-02-2016, 02:58 AM)loopline Wrote: Your thinking on the wrong level.
You don't use operators like google with the page scanner, you just use text or regex. The page scanner looks at the content of the entire page. If you want to do the whole domain then you would first need to use the link extractor to extract all internal links on the domain.
But your giving mixed info. So the page scanner does not use google.
So you can do site:domain.com "keyword" and google will give you what your after.
So can you clarify what it is exactly you are asking and want to do?
Or if you give some laser targeted examples I can give better info.