ScrapeBox Forum
scraping content on specific sites - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: Scrapebox Footprints (https://www.scrapeboxforum.com/Forum-scrapebox-footprints)
+--- Thread: scraping content on specific sites (/Thread-scraping-content-on-specific-sites)



scraping content on specific sites - karim0028 - 11-21-2016

hi guys,

im trying to scrape recipes off major sites and have them in specific files per recipe.

what is the best way to do this?

i dont really understand the masks aspect of scrape box. so if someone could give me a quick example it would be much appreciated.

for instance i found this site


{"@context":"http:\/\/schema.org\/","@type":"Recipe","name":"Chicken Teriyaki","author":"Namiko Chen","image":"http:\/\/cdn-jpg.thedailymeal.net\/sites\/default\/files\/2014\/09\/25\/chicken_teriyaki_namiko_chen_450x360.jpg","description":"Just so you know, the "chicken teriyaki sauce" in a bottle does not taste like real teriyaki sauce in Japan. Teriyaki is a cooking technique. "Teri" means the "luster" given by the sweet soy sauce marinade and "yaki" means "cooking or grilling," and it’s not really the name of the sauce.\r\nClick here to see 5 Essential Japanese Dishes to Know.","aggregateRating":{"@type":"AggregateRating","ratingValue":4.8,"reviewCount":"12","bestRating":5,"worstRating":4},"cookTime":"P0Y0M0DT0H25M0S","recipeYield":"2","recipeIngredient":["1 pound boneless, skin-on chicken breasts or thighs, chopped into large chunks","2 tablespoon soy sauce","2 tablespoon water","1 tablespoon mirin","1 tablespoon sugar","1\/4 onion, grated into a bowl with juices reserved","one 1-inch piece ginger, grated into a bowl with juices reserved","3 tablespoon sake","3 tablespoon vegetable oil"],"recipeInstructions":["Prick the chicken on the flesh side with a fork.\r\n\r\n\tCombine the soy sauce, water, mirin, sugar, onion, ginger, and 1 tablespoon of the sake in a bowl or zip-lock bag. Add the chicken, place in the refrigerator, and marinate for 2-3 hours.\r\n\r\n\tWhen ready to cook, heat 2 tablespoons of the vegetable oil in a large skillet over medium-high heat. When hot, shake off as much marinade as possible from the chicken and place the chicken pieces skin side down. (The marinade will burn easily so try not to add any of the liquid to the pan. Do not throw away the marinade.)\r\n\r\n\tWhen it's nicely browned, flip and cook the other side. Then, add the sake and cook, covered, for 8-10 minutes. Remove the chicken to a plate and clean the skillet. Heat the remaining oil over medium-high heat and put the chicken back in the skillet, skin side down first to make the skin crispy. Then, flip again and pour in the marinade. Cook until the sauce is reduced a bit. Baste the chicken a couple of times while cooking. Serve the chicken and pour the sauce on top. "],"nutrition":{"@type":"NutritionInformation","servingSize":"1 serving","calories":"543 calories","fatContent":"23 g","carbohydrateContent":"9 g","fiberContent":"1 g","saturatedFatContent":"5 g","sodiumContent":"1113 mg"}}

NOTE : this text was in the source of the page viewed..

i want to be able to pull in each page like this recipe with each file being the name of the recipe. specifically i want to pull in the

1)name of the recipe
2) recipe description
3) recipe ingredients
4) recipe instructions

each file would include these 4 things if found on the page (with the name of the recipe being the name of the file)

how would i do that?

thanks


RE: scraping content on specific sites - Try A Million - 11-25-2016

Scrapebox is a harvester of sites to post to. I don't know if SB has a addon for scraping sites, so excuse me if it has. Haven't got SB open to check right now.

My suggestion for such scrapes would be software designed for scraping websites. Remember though with your request that if you are taking that information like that you would likely be about to break the copyright laws.

My suggestion would be to look if they offer RSS but doesn't resolve the use of the content.

An even better solution would be to get a PLR pack with recipes from somewhere then can simply edit and use and no worries about scraping, programming a scraper to download the content, and you free to build your sites, etc without worries.


RE: scraping content on specific sites - loopline - 11-25-2016

You can't do this, is the short answer.

I mean each piece of data is likely to be its own mask, which means its going to be on its own line.

You will get 1 file output with all the data you scrape. So if you have 4 lines and 50 urls to scrape you will get 1 file with 200 lines in it.

So you could split that data out into files or excel rows etc.. from that point, but that would be the only way.