Forum Moderators: phranque

Message Too Old, No Replies

Send MSNbot a '410 Gone' response document

... If it won't follow a 301 redirect properly

         

tigertom

3:26 pm on Dec 10, 2006 (gmt 0)

10+ Year Member



The idea is to send the MSNBot a '410 Gone' response document _containing a fat hyperlink to the new site_, if it's not going to follow 310 redirects properly:

This isn't working:

####
ErrorDocument 410 /410.htm

Options +FollowSymlinks
RewriteEngine on

#### Serve MSNBot a 'Gone' response for any page on this site

RewriteCond %{HTTP_USER_AGENT} ^msnbot [NC]
## Exclude the 410 document
RewriteCond %{REQUEST_URI}!^/410.htm
## All other documents are gone
RewriteRule .* - [G,L]

## Redirect everyone else to the new site
Redirect 301 / http://www.mynewsite.com/doodads/

[edited by: jdMorgan at 3:53 pm (utc) on Dec. 10, 2006]
[edit reason] De-linked [/edit]

jdMorgan

3:53 pm on Dec 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can't mix directives from different modules and expect them to execute in line-order.

Each Apache module processes your .htaccess file in turn and acts only on the directives it recognizes. Therefore, the sequential dependency implied by your directive order is not what will actually happen.

The order in which Apache modules process your file is determined in Apache 1.x by the order the modules appear in the LoadModule list: Execution order is the reverse of that list order. In Apache 2.x, execution order is determined by an internal priority scheme.

Therefore, one cannot depend on line order to control logical execution order; In this case, your Redirect directive will be executed first on almost all normal Apache installations, and your mod_rewrite code will never apply.

The solution is to use only directives from the same Apache module when execution order is critical:


ErrorDocument 410 /410.htm
#
Options +FollowSymlinks
RewriteEngine on
#
# Serve MSNBot a 'Gone' response for any page on this site except the 410-Gone error document
RewriteCond %{HTTP_USER_AGENT} ^msnbot [NC]
RewriteRule !410\.htm$ - [G]
# Redirect everyone else to the new site
RewriteRule (.*) http://www.mynewsite.com/doodads/$1 [R=301,L]

Note the minor tweaks to remove unneccessary stuff and improve efficiency.

Jim

tigertom

6:19 pm on Dec 10, 2006 (gmt 0)

10+ Year Member



Thank you, Jim. That was informative, and interesting.

By chance, I came across a post on WebmasterWorld that said that a '410 Gone' response could only be understood by Agents which used the HTTP/1.1 protocol.

It looked like most search engine bots still use 1.0, and thus serving up a '404' response would be smarter.

I can't get this code to work on my server, but I'm trying to do too many things at once at the moment, so I'll leave it for now.

jdMorgan

8:31 pm on Dec 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most robots support "extended" HTTP/1.0, which includes support for the Hostname request header and 410 response. A good hint is that if your site is on a shared name-based server, it will be inaccessible to a true HTTP/1.0 client. Because of this, there ae few true HTTP/1.0 clients left on the Web, because they cannot access most sites.

Jim

tigertom

10:08 pm on Dec 10, 2006 (gmt 0)

10+ Year Member



Ok, I tried the code. I masqueraded as the msnbot 'MSNBOT/0.1 (http://search.msn.com/msnbot.htm)' using the FireFox User Agent Switcher.

It redirected to the new site regardless.

Any clues appreciated.

jdMorgan

10:30 pm on Dec 10, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I presume this code is in your top-level .htaccess file. If so, the only thing I can think of is to recommend that you flush your browser cache before testing any change to your .htaccess file, and also whenever you switch user-agents.

The browser won't re-fetch a locally-cached file from the server unless the cache entry is expired or the cache-control headers returned with the originally-fetched file marked it as uncacheable or requiring revalidation.

Jim

tigertom

9:06 am on Dec 11, 2006 (gmt 0)

10+ Year Member



Flushing my browser cache was a good idea.

Now, however, I'm getting a 404 error for the page '410.htm' on the site I'm redirecting _to_.

It looks like I'm being redirected to www.mynewsite.com/doodads/410.htm, which doesn't exist

jdMorgan

9:16 am on Dec 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I should have seen that...

One more try, one more flush:


ErrorDocument 410 /410.htm
#
Options +FollowSymlinks
RewriteEngine on
#
# Bypass all following rules for custom error document requests
RewriteRule ^410\.htm$ - [L]
#
# Serve MSNBot a 'Gone' response for any page on this site except error documents
RewriteCond %{HTTP_USER_AGENT} ^msnbot [NC]
RewriteRule .* - [G]
# Redirect everyone else to the new site
RewriteRule (.*) http://www.mynewsite.com/doodads/$1 [R=301,L]

Jim

tigertom

9:54 am on Dec 11, 2006 (gmt 0)

10+ Year Member



Aha!

Thanks a lot, Jim. That worked fine.