The Blueprint Training

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem With Email Scraper Custom Crawler
#1
So my issue is trying to get wild cards in there if possible.  Basically I have text like this
Code:
<div class="AAA" data-name="BBB">
    <div class="CCC">

        <h4 class="DDD-title">United States, NY</h4>
        <strong>New York<br>Brooklyn</strong>
        
        <p>
            EEE<br>

My issue is I need to get EEE scraped but I can't seem to figure how how.  Is there any way to do multiple markers?  I would like to do it like this somehow.

- Start it with class="AAA"
- Then Go to <p>
- Then end with <br>

Is there a way to just add in a wildcard to take care of all the inbetween text from AAA to the <p>
Reply
#2
Just put in AAA as before marker and </p> as after marker and let it scrape everything inbetween. You would have to post process to get rid of anything you do not want.

However there is no way to use multiple markers.

Else you could try and find different markers that did not get the extra text.

The only other way is regex, which may work better, although Im not a regex expert.
Reply




Users browsing this thread: 1 Guest(s)