Forum Moderators: phranque
Just been looking through our access log and was surprised to see an entry:
/a-file-without-the-extension/qwer/erty/dfgh/zxcv/hjk/cvbn/rty/yuoi.htm
I figured this is just msnbot checking our server is configred to give 404s properly, which i thought it was so I wasn't surprised to see this entry until I saw it had a 200 status.
I checked other pages that don't work and they 404 properly.
It looks like this causing problems because the first folder in the request does actually exist as .htm filename so its as if this is confusing the server. Is there some way to tell apache not to try and be clever and assume something is a folder if it doesn't have an extension?
Cheers
[edited by: Robber at 10:34 am (utc) on Oct. 12, 2004]
This probably isn't msn-bot checking your server to see if it does good 404's - I doubt msn-bot really gives a whee. SE spiders get confused easily and try to hit wrong url's all the time. Problem is, they can index whole sub-sites of your site that are identical to the main site, with zillions of duplicate content flags.
Thanks for the input.
We dont use much mod_rewrite on this site, this is it really:
RewriteCond %{HTTP_HOST}!^www\.abc\.co\.uk
RewriteRule ^(.*)$ h**p://www.abc.co.uk$1 [R=301,L]
I am pretty sure the SEs have admitted to running 404 checkers in the past.
I think this content negotiation is the culprit but I dont know how to turn it off yet!
Thanks