ScrapeBox Forum
I'm missing something basic relating to broken link checker - Printable Version

+- ScrapeBox Forum (https://www.scrapeboxforum.com)
+-- Forum: ScrapeBox Main Discussion (https://www.scrapeboxforum.com/Forum-scrapebox-main-discussion)
+--- Forum: General ScrapeBox Talk (https://www.scrapeboxforum.com/Forum-general-scrapebox-talk)
+--- Thread: I'm missing something basic relating to broken link checker (/Thread-i-m-missing-something-basic-relating-to-broken-link-checker)



I'm missing something basic relating to broken link checker - colink - 06-26-2021

Broken Link Checker finds these links in my page OK

<a href="https://google.com">Google/a>
<a href="about-us">About</a>

But does not list these in results - either as Online or 404 (even if eg the image does not exist)

1. <a href="page04#topic"> 
2. src="images/img-city02.jpg"

Both of the above show as blue hyperlinks in my browser when viewing source. When I click these links I get the valid full URL and the page or image opens
1. https://mydomain.com/page04#topic
2. https://mydomain.com/images/img-city02.jpg


I have tried an online link checker and it does return a broken link if eg the image is missing.

What have I misunderstood or what am I doing wrong?


RE: I'm missing something basic relating to broken link checker - loopline - 06-26-2021

If you want to post an exact url of a real site I can test it, but the broken link checker is looking for status codes. So if the image is missing but the page still returns a 200 its not going to show as broken.


RE: I'm missing something basic relating to broken link checker - colink - 06-27-2021

Thanks for your reply.

I do not want to post real URL here. If necessary I will make a test URL, but I suspect part of the problem is my understanding of what the link checker checks and does not check - particularly "src" images.

If this is what is on the page, and both are active internal links to a live page and image (from the Source version of page in Chrome) should they show up in the link checker as a 200
<a href="page04#topic">
src="images/img-city02.jpg"

In the same way as this link works (though it does return a 404 as the page is not present)

<a href="about-us">About</a>

When I create <a href="about-us">About</a> page and re-run the checker it returns it as a 200 Online)

The two examples above page04 and city2.jpg (are 2 of many on each page) do not show up in a search of 500 pages (which all have multiple internal links and working images), whereas the about-us page and various external links show up in all 500 pages.

When I use https://validator.w3.org/checklink I get all of the internal links (which are coded exactly the same as About us (except for the #topic anchor) eg <a href="page05#topic">

The W3C validator finds this - src="images/vid1.mp4", but it does not find any src .jpg's
W3C does find src="images/icon.ico"

To clarify Scraprebox does find <a href="index">Home</a> and identifies that <a href="about-us">About</a> as a 200 or 404 (depending if it exists or not)


Thanks ColinK


RE: I'm missing something basic relating to broken link checker - loopline - 06-30-2021

Im a little lost, but are these links produced by javascript?

Do ctlr+D in your browser and look at the source code do NOT do inspect element.

Scrapebox does not support javascript, so if they are produced by javascript then its not going to work.

If that does not sort it, if you can either PM me a link or create an example page them Im happy to help, but Im going to have to see the surrounding code and test it in scrapebox and watch it in a debugger to possibly solve the issue.


RE: I'm missing something basic relating to broken link checker - colink - 07-03-2021

Thanks for your willingness to look at this. PM Sent

The site is coded as AMP HTML, so it contains some Jajascript in relation to Google CDN, but all other code on the page is simple AMP HTML + PHP.

Thanks ColinK