Forum Moderators: phranque
-----------
RewriteEngine On
RewriteBase /
ServerSignature Off
Options -Indexes
Options +FollowSymLinks
# Set long expire headers for better browser caching
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault A604800
<FilesMatch "\.(jpg¦jpeg¦png¦gif¦swf)$">
ExpiresDefault A2419200
</FilesMatch>
</IfModule>
# The canonical thing
RewriteCond %{HTTP_HOST} !^www\.mysite\.co\.uk$
RewriteCond %{HTTP_HOST} !^$
RewriteRule (.*) [mysite.co.uk...] [R=301]
#Make all dross GONE and to be removed from indexes, with a 410 error page
RewriteRule !\.(html)$ - [S=17]
RewriteRule ^Home\.html$ - [G,L,NC]
RewriteRule ^Home-s\.html$ - [G,L,NC]
If www is requested, they will get 410 response. If non-www is requested they will be redirected to the www; and the new HTTP request will get 410 response.
For [G], the [L] is implied and can be omitted.
For the [R=301] you do need [R=301,L] here.
That was just the number of RrewriteRules with 'html' in them (to skip if not with 'html') and it was correct.
I have created what could be a solution to my problem (except for a few URLs), as follows (I've shown the full Monty of 410s for clarity and for interest the extent of the potential 'duplication', but I've not shown the 410gone page link at the bottom). It all works for me for all 'canonical' permutations, without any problems and I'm just watching my stats now (your opinion would be most appreciated):
--------------------
# The canonical thing
RewriteCond %{HTTP_HOST} !^www\.nysite\.co\.uk$
RewriteCond %{HTTP_HOST} !^$
RewriteRule (.*) [mysite.co.uk...] [R=301,L]
#Make all dross GONE and to be removed from indexes, with a 410 error page
RewriteCond %{REQUEST_URI} ^/Home\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Home-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Home\+Page-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/About\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Advertise\+With\+Us-s-8-p-Standard-d\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Advertise\+With\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Property\+Search-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Testimonials-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Links-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Terms-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/FAQ-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Contact\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Search\+Results-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Register\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Register-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/There\+has\+been\+a\+problem-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/property\+search\.html
RewriteRule [.*] - [G]
RewriteRule ^About\+Us-s-21-p- - [G,NC]
RewriteRule ^Rental\+Property-s- - [G,NC]
RewriteRule ^Travel\+Services-s- - [G,NC]
RewriteCond %{QUERY_STRING} ^section=Testimonials$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Terms$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Links$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=FAQ$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Contact\+Us$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Advertise\+With\+Us [NC,OR]
RewriteCond %{QUERY_STRING} ^section=About\+Us [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Search\+Results [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Travel [NC,OR]
RewriteCond %{QUERY_STRING} ^type=adv$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=There\+has\+been\+a\+problem$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Rental\+Property&advertid= [NC]
RewriteRule [.*] - [G]
RewriteCond %{QUERY_STRING} ^[^searchtype] [NC]
RewriteRule ^SearchResults-s\.html$ - [G,NC]
RewriteCond %{QUERY_STRING} ^section=
RewriteRule ^Travel\+Services\+Page[0-9]+\.html$ - [G,NC]
Thanks! Dave
RewriteRule [.*] - [G] -- The [.*] says get any URL made of a dot or an asterisk. You likely wanted (.*) or just .* here. RewriteCond %{QUERY_STRING} ^[^searchtype] [NC] -- This says that the Query String parameter name must not begin with the single letter s or the letter e or the letter a or r or c or h or t or y or p or e. You likely needed just &?searchtype=[^&]*&? here.
I've edited the [.*] as you advised. The other is to do the rule if the query string does not start with 'searchtype', so i've made it:
RewriteCond %{QUERY_STRING} !^searchtype [NC]
RewriteRule ^SearchResults-s\.html$ - [G,NC]
They work for me still (I'm sure more correctly now).
Regards, Dave
I hope my methods will avoid 301s (by making the conditions/rules independent of the canonicalisation).
Any thoughts on the use of 410 Gones in such circumstances? By the way, I don't see any evidence of the SEs removing references to these 410d URLs yet - maybe they're making sure first or checking if they are what they mostly are (alternative routes to the sitemap URLs).
What's the weather like there? It's a beautiful day here in Manchester and I'm dragging myself away from this terminal for a break now ;-)
It will take Google several weeks or more to remove those results. Don't worry about that.
You will see a 301 for those filenames if the non-www version is requested, because the Redirect is listed before the Gone. Do not worry about that either. Redirects are not indexed.
The absolutely-correct answer depends on whether those URL-paths were ever indexed by search engines using the 'wrong' domain. If so, then 410-Gone is the correct answer, regardless of requested hostname. If those URL-paths were never indexed in the wrong domain, then technically you'd want to return 410-Gone for the correct domain, and 404-Not Found for the incorrect domain. But either way, there'd be no need for a redirect.
Further confounding this answer is the fact that most (maybe all) search engine spiders do not differentiate between 410 and 404 responses. They will continue to request obsolete URLs, usually for a long long time (years), even though in the case of a 410 we're trying to tell them, "Yes, we removed this URL intentionally" and therefore, "there's no need to ask for it again." They treat the 410 the same way as the ambiguous "404-Oops we can't find that right now and we don't know why" response.
Jim
Anyway, your help to so many people more than compensates for this and I'm sure that the SEs will eventually read these posts and catch up with reality ;-)
regards, Dave
You then need to look and ignore the ones that are generating the response that you want.
Yeah, its semantics; but they are important.