homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Why run a 404 test?
If they're searching for stolen content?
mivox




msg:395217
 7:44 pm on Jun 1, 2001 (gmt 0)

OK, I know digital-integrity's little copyright infringement bot has been mentioned before, but I can't find anyone who's figured out why one of their standard file requests is /test404response...

I can understand that people buy their software, or service or whatever, to make sure their content isn't being stolen (at least that's what their website says), but what does that have to do with testing the "404 response" of my website?

Any idea? Someone afraid I stole their 404 page?

 

toolman




msg:395218
 7:53 pm on Jun 1, 2001 (gmt 0)

Are you blocking them via robots.txt? I get the same weird thing happening too.

mivox




msg:395219
 7:57 pm on Jun 1, 2001 (gmt 0)

Not blocking them... they don't come around often, and they don't hog my server when they stop by, and they request (and seem to follow) my robots.txt. The only folks I generally block are the ones storming through requesting hundreds of pages at a time, or totally ingnoring my robots.txt.

skirril




msg:395220
 11:44 pm on Jun 4, 2001 (gmt 0)

I also saw some "404" testers coming around.
(either request a web page called "x" or something that looks like a one time name, guaranteed to not exist).

Could they be testing whether you have taken the time to adapt your error page? -perhaps hoping to find a link there to your regular pages? (which will be moast likely called index.html or inde.htm)

Who, in turn can today afford to give a 404 not found when someone simply calls [awebsite.com(...]

Could they in turn try to figure out what webserver you are running, to get a point of attack?

Just ideas...

mivox




msg:395221
 11:50 pm on Jun 4, 2001 (gmt 0)

try to figure out what webserver you are running

I'm sure some people do that, but digital-integrity is a legit site, offering a legit service (hunting down online copyright infringement), which desn't seem to have anything to do with testing 404 pages, so I was just wondering what the heck this specific spider was up to....

theperlyking




msg:395222
 12:33 am on Jun 5, 2001 (gmt 0)

No need to do that to find out what webserver you're running - the HTTP headers give that away for every file the server serves.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved