Forum Moderators: phranque

Message Too Old, No Replies

400 Bad Request Indexed in Yahoo

400 bad request indexed in yahoo, anyway to prevent this

         

spellham

9:21 pm on Feb 26, 2007 (gmt 0)

10+ Year Member



I recently ran across this past post in the forum that I believe was unsolved - [webmasterworld.com...] related to a 400 bad request error indexed in Yahoo. We are experiencing similar problems with Yahoo as the original post describes.

Essentially, Slurp and Inktomi are making direct requests to our homepage over normal http to port 443. Apache behaves properly and throws the 400 Bad Request error, but I'm curious that since this is request is not valid and won't resolve to a page - can a redirect be done for this specific request to our home page over plain http.

Is this something that could be done via .htaccess or something in httpd.conf or ssl.conf?

I've searched the past posts where others have overcome similar problems, but not the exact scenario described in the post linked above - we seem to have the exact same problem, but I don't believe the post received any responses.

Any ideas on how to handle such requests?

encyclo

1:42 am on Feb 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First check that your server is actually serving a "400 Bad Request" server header, not just the error page itself but with a 302 or 200 OK response code. On an (admittedly brief) check for 400 Bad Request indexed pages in Yahoo, none of the sites actually served the 400 response code correctly.

You can check by using the Live HTTP Headers extension in Firefox, or by using:

curl -v http://www.example.com:443

Only once you've discounted any technical reason why your error page is being indexed can you investigate what exactly Yahoo is doing when handling the page.

spellham

9:37 pm on Feb 28, 2007 (gmt 0)

10+ Year Member



Thanks for the response encyclo.

I installed the Live HTTP headers extension for Firefox and found that this request was indeed serving up a 200 OK response.

I found that the default apache ErrorDocument 400 specified in httpd.conf is not being used - I activated it and repeated the request. It shows the default apache 400 error page but the extension still looked to report a 200 OK response.

I guess the question I have now is whether or not I need to create a custom 400 error page and send the header response of 400 using PHP - I don't understand why Apache wouldn't serve up a 400 error with its default ErrorDocument configuration.

Are all users of apache required to make this return a 400 error via manually specifying a custom header() response or is the HTTP headers plugin returning the incorrect code?

If you'd like to see the header responses or want me to provide a link for you to look at, let me know and I can email or PM that to you.

Thanks

encyclo

2:44 am on Mar 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Setting up a custom 400 error page and forcing the header with PHP sounds like a good idea for quickly fixing the immediate problem. You could also consider doing a rewrite to 301 the page request from the secure site URL to the standard http version, as you would in that way recuperate the traffic to that page until the two pages consolidate in Yahoo.

However it is surprising that Apache isn't responding correctly to a bad request. I assume the revised httpd.conf was tested with the browser cache cleared and the Apache restarted? If you add the

ErrorDocument
directive in a root-level .htaccess does it help?

spellham

7:30 pm on Mar 1, 2007 (gmt 0)

10+ Year Member



Did more testing and research last night. I had apache up and down a multitude of times trying to tweak this and get the right header response from Apache - to no avail...

In short, the ErrorDocument 400 /pages/400.php directive is specified in httpd.conf now, but I'm getting a 301 response and a default error page that 'links' to 400.php instead of actually going there.

I've also discovered that we have some virtualhost IP overlap and the UseCanonicalName directive is set to OFF - with this directive turned OFF in httpd.conf Apache defers to the gethostname() C function that defaults to the default machine name (not our domain name) - this being the case, I'm thinking that Apache can't determine the proper relative path to the /pages/400.php script and displays the default script instead.

I've tried implementing the ErrorDocument directive at the .htaccess level in the docroot of the domain and I get the same response.

I'm curious as to whether the IP address overlap could be causing issues or if I need to turn UseCanonicalName 'On' - but I don't know what other adverse effects this would have on our servers?

encyclo

3:09 am on Mar 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From your syntax and description, I can't see why Apache isn't sending the appropriate response header. I've moved the thread over to our Apache forum to attract the attention of our resident Apache specialists to see if anyone else can shed a light on your problem. :)

spellham

3:59 am on Mar 2, 2007 (gmt 0)

10+ Year Member



I have made some progress by testing on one of our other servers:

Specifically for the virtual host in question I did the following:

UseCanonicalName On
ErrorDocument 400 /pages/400.php

The 400.php script issues a header of:

header("HTTP/1.1 400 Bad Request");

and the 400 error is reported in the ssl_access log as a 400 Bad Request error. After the initial 400 header, I attempted to issue a 301 header redirect to the home page so the client's 'bad request' would be redirected to the home page instead of just an error page, but that changes the header response to a 301 in the logs. I'm not sure that a 301 response will solve the problem at hand. I hate to just show them an error page with a link on it, but since this is specifically for Slurp/Inktomi (they are the only ones silly enough to keep making this url request) that may have to suffice.

I guess an alternative would be to implement a rewrite rule in .htaccess from 'http://www.domain.com:443' to 'www.domain.com' but I'm a bit rusty on RewriteRules and I'm not sure how to implement that or if it will solve the problem with Slurp/Inktomi continuously making requests to this url?

jdMorgan

2:14 pm on Mar 5, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Protocol-based rewrites/redirects are discussed in this recent thread [webmasterworld.com].

Jim