Forum Moderators: open

Message Too Old, No Replies

damn404 broken link checker

         

keyplyr

7:03 pm on Aug 30, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: Mozilla/5.0 (compatible; damn404 broken link checker - http://damn404.com/crawlinfo/ )
Protocol: HTTP/1.1
Robots.txt: Yes
Host: hetzner.de
78.46.18.64 - 78.46.18.127
78.46.0.0/15
144.76.0.0 - 144.76.255.255
144.76.0.0/16

Caught this agent scraping files not needed for link validity, so either it is being abused or spoofed.

Previous discussion: [webmasterworld.com...]

lucy24

8:23 pm on Aug 30, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So the real thing (previous thread) doesn't ask for robots.txt but the spoofer does? Ha.

keyplyr

8:29 pm on Aug 30, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not saying either is the "real thing" really. Without a deeper investigation, any of these UA reports are just speculation they are who they say they are.

The first sighting could have had the robots.txt cached from an earlier visit, or the setting could have been changed. Lots of possibilities, including me missing it.

damn404admin

5:15 pm on Nov 11, 2017 (gmt 0)

5+ Year Member



Hi keyplyr,

damn404 is crawling and parsing HTML documents to find broken links.
Other media like PDF files are checked with a "HEAD" request if they are linked.

Robots.txt file is generally obeyed, if you can show me a specific file that you think was crawled maliciously give me a shout and I will look into it.

keyplyr

7:27 pm on Nov 11, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi damn404admin and welcome to WebmasterWorld [webmasterworld.com]

phranque

12:49 am on Nov 12, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, damn404admin!