|Why run a 404 test?|
If they're searching for stolen content?
Msg#: 735 posted 7:44 pm on Jun 1, 2001 (gmt 0)
OK, I know digital-integrity's little copyright infringement bot has been mentioned before, but I can't find anyone who's figured out why one of their standard file requests is /test404response...
I can understand that people buy their software, or service or whatever, to make sure their content isn't being stolen (at least that's what their website says), but what does that have to do with testing the "404 response" of my website?
Any idea? Someone afraid I stole their 404 page?
Msg#: 735 posted 7:53 pm on Jun 1, 2001 (gmt 0)
Are you blocking them via robots.txt? I get the same weird thing happening too.
Msg#: 735 posted 7:57 pm on Jun 1, 2001 (gmt 0)
Not blocking them... they don't come around often, and they don't hog my server when they stop by, and they request (and seem to follow) my robots.txt. The only folks I generally block are the ones storming through requesting hundreds of pages at a time, or totally ingnoring my robots.txt.
Msg#: 735 posted 11:44 pm on Jun 4, 2001 (gmt 0)
I also saw some "404" testers coming around.
(either request a web page called "x" or something that looks like a one time name, guaranteed to not exist).
Could they be testing whether you have taken the time to adapt your error page? -perhaps hoping to find a link there to your regular pages? (which will be moast likely called index.html or inde.htm)
Who, in turn can today afford to give a 404 not found when someone simply calls [awebsite.com(...]
Could they in turn try to figure out what webserver you are running, to get a point of attack?
Msg#: 735 posted 11:50 pm on Jun 4, 2001 (gmt 0)
try to figure out what webserver you are running
I'm sure some people do that, but digital-integrity is a legit site, offering a legit service (hunting down online copyright infringement), which desn't seem to have anything to do with testing 404 pages, so I was just wondering what the heck this specific spider was up to....
Msg#: 735 posted 12:33 am on Jun 5, 2001 (gmt 0)
No need to do that to find out what webserver you're running - the HTTP headers give that away for every file the server serves.