Forum Moderators: phranque

Message Too Old, No Replies

Google getting 301 and 410 at different times

         

eclipsetbs

1:28 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



This is the start of my .htaccess. I had the canonical thing at the end of all external redirects, with a [L], but Google were sometimes getting 301s instead of 410s, so I have positioned it at the top as shown and without a [L]. I believe that this should result in any canonical redirects being actioned before the forced 410s (and this is a rare occasion where a [L] should not be used), allowing the 410s to be actioned correctly on any canonical redirects. Do you agree? Why the 410s? - You may remember the pickle I got into in January with Google seing my internal URLs. This is my answer (I know I will have to wait again for ranking, but that was pitifully low anyway). Thanks, Dave.

-----------

RewriteEngine On
RewriteBase /
ServerSignature Off
Options -Indexes
Options +FollowSymLinks

# Set long expire headers for better browser caching
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault A604800
<FilesMatch "\.(jpg¦jpeg¦png¦gif¦swf)$">
ExpiresDefault A2419200
</FilesMatch>
</IfModule>

# The canonical thing
RewriteCond %{HTTP_HOST} !^www\.mysite\.co\.uk$
RewriteCond %{HTTP_HOST} !^$
RewriteRule (.*) [mysite.co.uk...] [R=301]

#Make all dross GONE and to be removed from indexes, with a 410 error page
RewriteRule !\.(html)$ - [S=17]
RewriteRule ^Home\.html$ - [G,L,NC]
RewriteRule ^Home-s\.html$ - [G,L,NC]

g1smd

4:21 pm on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They will still sometimes get 301 and other times a 410.

If www is requested, they will get 410 response. If non-www is requested they will be redirected to the www; and the new HTTP request will get 410 response.

For [G], the [L] is implied and can be omitted.

For the [R=301] you do need [R=301,L] here.

eclipsetbs

5:09 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



Thanks glsmd. So if I edit as you advised, this is a good solution (Google should not see 301s for the Gone group?)
Regards, Dave

eclipsetbs

5:41 pm on Jun 28, 2009 (gmt 0)

10+ Year Member



glsmd, minutes before I updated .htaccess, Google zapped me from two places, the first one getting 410s, the second one getting 301s!
I think your updates may not make a difference in practice from what you said, so is there anything else I can do?
Regards, Dave

jdMorgan

10:30 pm on Jun 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What is the significance to you of the "17" in [S=17], and is that count correct?

Jim

eclipsetbs

12:21 am on Jun 29, 2009 (gmt 0)

10+ Year Member



Hi Jim

That was just the number of RrewriteRules with 'html' in them (to skip if not with 'html') and it was correct.
I have created what could be a solution to my problem (except for a few URLs), as follows (I've shown the full Monty of 410s for clarity and for interest the extent of the potential 'duplication', but I've not shown the 410gone page link at the bottom). It all works for me for all 'canonical' permutations, without any problems and I'm just watching my stats now (your opinion would be most appreciated):

--------------------

# The canonical thing
RewriteCond %{HTTP_HOST} !^www\.nysite\.co\.uk$
RewriteCond %{HTTP_HOST} !^$
RewriteRule (.*) [mysite.co.uk...] [R=301,L]

#Make all dross GONE and to be removed from indexes, with a 410 error page

RewriteCond %{REQUEST_URI} ^/Home\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Home-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Home\+Page-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/About\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Advertise\+With\+Us-s-8-p-Standard-d\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Advertise\+With\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Property\+Search-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Testimonials-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Links-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Terms-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/FAQ-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Contact\+Us-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Search\+Results-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Register\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/Register-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/There\+has\+been\+a\+problem-s\.html$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/property\+search\.html
RewriteRule [.*] - [G]

RewriteRule ^About\+Us-s-21-p- - [G,NC]
RewriteRule ^Rental\+Property-s- - [G,NC]
RewriteRule ^Travel\+Services-s- - [G,NC]

RewriteCond %{QUERY_STRING} ^section=Testimonials$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Terms$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Links$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=FAQ$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Contact\+Us$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Advertise\+With\+Us [NC,OR]
RewriteCond %{QUERY_STRING} ^section=About\+Us [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Search\+Results [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Travel [NC,OR]
RewriteCond %{QUERY_STRING} ^type=adv$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=There\+has\+been\+a\+problem$ [NC,OR]
RewriteCond %{QUERY_STRING} ^section=Rental\+Property&advertid= [NC]
RewriteRule [.*] - [G]

RewriteCond %{QUERY_STRING} ^[^searchtype] [NC]
RewriteRule ^SearchResults-s\.html$ - [G,NC]
RewriteCond %{QUERY_STRING} ^section=
RewriteRule ^Travel\+Services\+Page[0-9]+\.html$ - [G,NC]

Thanks! Dave

g1smd

12:53 am on Jun 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Some typos or misunderstandings?

RewriteRule [.*] - [G]
-- The [.*] says get any URL made of a dot or an asterisk. You likely wanted
(.*)
or just
.*
here.

RewriteCond %{QUERY_STRING} ^[^searchtype] [NC]
-- This says that the Query String parameter name must not begin with the single letter s or the letter e or the letter a or r or c or h or t or y or p or e. You likely needed just
&?searchtype=[^&]*&?
here.

eclipsetbs

11:18 am on Jun 29, 2009 (gmt 0)

10+ Year Member



Thanks glsmd. Neither (just ignorance!).

I've edited the [.*] as you advised. The other is to do the rule if the query string does not start with 'searchtype', so i've made it:
RewriteCond %{QUERY_STRING} !^searchtype [NC]
RewriteRule ^SearchResults-s\.html$ - [G,NC]

They work for me still (I'm sure more correctly now).

Regards, Dave

g1smd

11:51 am on Jun 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The [^searchtype] one would have stopped 'searchtype' and 's' and 'sample' and 'example' and 'anexample' and 'result' and 'chance' and anything else that began with any single letter in that list. :)

eclipsetbs

12:21 pm on Jun 29, 2009 (gmt 0)

10+ Year Member



That's a big step for me actually, knowing how to use [] corectly; it's real wizardry and fascinating when you make it work.

I hope my methods will avoid 301s (by making the conditions/rules independent of the canonicalisation).

Any thoughts on the use of 410 Gones in such circumstances? By the way, I don't see any evidence of the SEs removing references to these 410d URLs yet - maybe they're making sure first or checking if they are what they mostly are (alternative routes to the sitemap URLs).

What's the weather like there? It's a beautiful day here in Manchester and I'm dragging myself away from this terminal for a break now ;-)

g1smd

12:32 pm on Jun 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sunny, and hot! The heatwave is coming. :)

It will take Google several weeks or more to remove those results. Don't worry about that.

You will see a 301 for those filenames if the non-www version is requested, because the Redirect is listed before the Gone. Do not worry about that either. Redirects are not indexed.

eclipsetbs

2:19 pm on Jun 29, 2009 (gmt 0)

10+ Year Member



Back again out of the sun! That make me feel a lot more comfortable (your post as well as the sun). Would it be a good idea for me to push the redirect back again to below the GONEs anyway? So aren't my changed 410s now independent of the presence or absense of the www then (with the .* at the front of the URLs)?
It obviously takes years to learn everything about .htaccess.
Regards, Dave

jdMorgan

10:21 pm on Jun 29, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is rather a waste of time to 301-redirect requests for URL-paths that are Gone, so moving the 301 redirect below the 410-Gone section would make sense.

The absolutely-correct answer depends on whether those URL-paths were ever indexed by search engines using the 'wrong' domain. If so, then 410-Gone is the correct answer, regardless of requested hostname. If those URL-paths were never indexed in the wrong domain, then technically you'd want to return 410-Gone for the correct domain, and 404-Not Found for the incorrect domain. But either way, there'd be no need for a redirect.

Further confounding this answer is the fact that most (maybe all) search engine spiders do not differentiate between 410 and 404 responses. They will continue to request obsolete URLs, usually for a long long time (years), even though in the case of a 410 we're trying to tell them, "Yes, we removed this URL intentionally" and therefore, "there's no need to ask for it again." They treat the 410 the same way as the ambiguous "404-Oops we can't find that right now and we don't know why" response.

Jim

eclipsetbs

10:48 am on Jun 30, 2009 (gmt 0)

10+ Year Member



Thanks for your advice guys. I'll move the canonical thing back down. It should be easy for the SEs to follow the rules themselves and remove 410s from their indexes. In their webmaster info they also make it look as if sites have "ERRORS", wrongly implying that site owners should correct them, eg 404s for URLs they know have been removed (from their own 'removed' lists). This is after a thorough check has been done for the absense of external and internal links of course. They must enjoy putting the fear of God in us, like 'Big Brothers'! :-(

Anyway, your help to so many people more than compensates for this and I'm sure that the SEs will eventually read these posts and catch up with reality ;-)

regards, Dave

g1smd

1:13 am on Jul 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A SearchEngine can't possibly know if a 404 they find is one you intended or is a complete shock to you. So their bot will always flag it as an 'error'. It is an error. Nothing was found at this URL.

You then need to look and ignore the ones that are generating the response that you want.

Yeah, its semantics; but they are important.