| 1:56 am on Dec 5, 2013 (gmt 0)|
Welcome to WebmasterWorld, coen!
This is very interesting observation, and if true, it may have implications for many websites.
We have had a very recent thread in what it seems Google ignoring 301 redirect, described in this thread:
Google returns both correct and incorrect versions of 301ed url
Interestingly, there were some issues with the redirect code in his case (such as returning content as well as setting cookies, details in the above thread).
I am more inclined to believe that Google is trying to second-guess whether this should really be 301 or a webmaster made a mistake (based on perhaps the content of the response sent other than just acting on the status code), rather than believing that Google has decided to completely ignore 301 and go for a shorter URL. But I may be wrong!
What happens when you do "Fetch as Googlebot" in Google Webmaster Tools? Can you post the Googlebot response in this thread (replacing your domain with example.com where needed)?
| 2:13 am on Dec 5, 2013 (gmt 0)|
I assume you mean fetching the short url with Google Bot. Because the long one just gives the full page (status 200).
The short one gives a 301, see code below:
HTTP/1.1 301 Moved Permanently
Date: Thu, 05 Dec 2013 02:14:07 GMT
Last-Modified: Thu, 05 Dec 2013 02:14:07 GMT
Expires: Thu, 05 Dec 2013 02:15:07 GMT
Set-Cookie: fp_sess=4481766337e7268814dabd3; expires=Fri, 05-Dec-2014 02:14:07 GMT; path=/; domain=.example.com
Keep-Alive: timeout=5, max=1000
Content-Type: text/html; charset=ISO-8859-1
[edited by: coen at 2:16 am (utc) on Dec 5, 2013]
|brotherhood of LAN|
| 2:16 am on Dec 5, 2013 (gmt 0)|
Have you checked the shortened 301 URLs to see if they've somehow acquired backlinks? Maybe there's an indication to Google that these shortened URLs are some kind of authority over the longer URL.
Though from your description, there shouldn't be too much room for confusion for Google.
| 2:19 am on Dec 5, 2013 (gmt 0)|
The website exists for more then 10 years, so we are talking about 50.000+ article links that Google is massively changing.
Also other websites that are having the same issues, have tens of thousands of articles where Google is changing the url to the shorter one.
| 2:28 am on Dec 5, 2013 (gmt 0)|
I can see something done on both of these threads - you are both setting cookies and some other unnecessary headers with your 301 response.
They are not needed. They should be ignored by Google. But maybe, because they are sent, Google decides it is perhaps not 301 and that mistake part is the wrong status. This is just a speculation though.
There is only one way to find this out really - this is that, when 301 is required, remove all already set headers and send back only response code and location, and to see what happens.
Then there would be some time required for Google to re-crawl and to see what then happens - i.e. whether the 301 is then honoured.
I had in mind something like:
// You are here if you are redirecting.
// Firstly remove all headers already set - so that cookies, session etc are not sent
// then send back only the response code and location headers and then exit!
header("HTTP/1.1 301 Moved Permanently");
Yes, it is possible that Google suddenly started to handle some edge cases differently. Saw this in May when the HTTP 500 on robots.txt caused the site to get deindexed, but according to logs, HTTP 500 was being returned for the 18 months prior to deindexing. So nothing changed on webmaster's side, but Google's handing changed. So possible.
| 10:46 am on Dec 5, 2013 (gmt 0)|
I removed the headers but I don't believe that will be the solution. I'll report back over a few days if the numbers are increasing or still decreasing.
| 3:45 pm on Dec 5, 2013 (gmt 0)|
@coen - have you checked your logs to ensure that you are always sending 301 and that there are no glitches that would on occassion send 200 OK?
I have done some digging and there are other reports on the web from the last couple of months, describing pretty much what you have reported - which is that Google uses shorter URL despite 301 redirect being in place.
I saw two reports in Google support forums. One is pretty much identical as coen's example where the longer URL after the /######/ id part has been dropped and the shorter URL indexed, despite redirect being executed. This particular post was posted mid November. The poster says they had redirect in place for some time, the URLs were not in index, and then they suddenly started to appear in index couple of weeks before he posted, which would make it end of October. They are noticing this in two of their sites.
The second case that caught my eye is where the poster was purposely redirecting example.com/somedirectory/ to example.com/somedirectory/index.html and Google suddenly decided to index example.com/somedirectory/ rather than example.com/somedirectory/index.html
Lets put aside the fact that the poster was redirecting the wrong way around to what is the best practice - if the report is correct and the 301 is set up properly, then somehow 301 is being ignored.
It is difficult to be absolutely certain that recent reports of Google not following 301 do not have some other techincal issues that would explain the reason. It is however interesting that reports on the web say Google suddenly started to re-surface old (shorter) URLs because it stopped following redirects that were in place for the some time.
| 4:27 pm on Dec 5, 2013 (gmt 0)|
It may not just be Google getting 301s wrong - they could have decided to do this for other reasons best known to them.
These urls are fairly common to CMS systems where SEO is a 'bolt on'.
The ID number is all that's needed for the db lookup, and the rest of the url is usually irrelevant. The fact that there's a slash between the ID and the rest of the url (rather than a dash) gives the impression that there is a folder in place.
So, if Google finds the same content at a url that 'belongs' to a folder as it does at the folder alone, it may assume that 'your-friendly-url.htm' is similar to 'index.html', or 'default.aspx' or 'home.asp'.
Your 301s may be deliberately being ignored because the vast majority of people using this sort of url do not implement 301s, so Google is assuming that it can take it for granted that the short url will do, regardless of your stated preference.
| 4:38 pm on Dec 5, 2013 (gmt 0)|
@FranticFish if the things I changed earlier today don't have any effect, I think that will be my fix at the end. Changing the last / into a -.
Because it's easy for me to forward both urls to that format without losing visitors. Only negative part is that all the Facebook likes and Twitter posts will be reset to zero.
But I do believe that having the article title in the url is good for SERP.
| 5:13 pm on Dec 5, 2013 (gmt 0)|
|It may not just be Google getting 301s wrong - they could have decided to do this for other reasons best known to them. |
If 301 is being ignored on purpose in some circumstances, then this would mean that Google started to treat 301 as a hint and not as a directive.
Which means that this may not apply always any more:
|A server-side 301 redirect is the best way to ensure that users and search engines are directed to the correct page. The 301 status code means that a page has permanently moved to a new location. |
It seems that this does change the URL best practice - the ID must not appear to be a separate folder in URL, otherwise Google may not index URL you wish it to index.
What I am trying to establish is:
- is this in fact an error somewhere on webmaster's side and we are not seeing it yet
- or is this a Google glitch/bug which will then be fixed
- or is this a new way Google handles 301 redirects
| 9:41 pm on Dec 5, 2013 (gmt 0)|
|Lets put aside the fact that the poster was redirecting the wrong way around to what is the best practice - if the report is correct and the 301 is set up properly, then somehow 301 is being ignored. |
Question that rears its head at this point:
Can anyone point to a case where google ignores a canonical tag and/or a 301 when the site's preferred URL is the better one? The recent posts on this subject always seem to involve a redirect from a shorter to a longer form.
| 10:25 pm on Dec 5, 2013 (gmt 0)|
I can understand Google ignoring canonical link element as it is a "hint" and Google always said they reserve right not to take it into the account. But 301 redirect is not a hint, it supposed to be a directive.
If Google is ignoring technically sound 301 redirect just because the URL being redirected is shorter (and the target URL is longer), then this is something very new in Google behaviour.
|So, if Google finds the same content at a url that 'belongs' to a folder as it does at the folder alone, it may assume that 'your-friendly-url.htm' is similar to 'index.html', or 'default.aspx' or 'home.asp'. |
Interesting and yes, I follow your thinking. And this would be perfectly logical if there is no 301 in place. But since the shorter URL responds with 301 redirect, Google would not get the content of that shorter URL and therefore could not know that the shorter URL would have the same content as the longer one in cases where shorter URL does not redirect.
|Your 301s may be deliberately being ignored because the vast majority of people using this sort of url do not implement 301s, so Google is assuming that it can take it for granted that the short url will do, regardless of your stated preference. |
With what we are seeing so far, this certainly seems so. But this means that Google treats 301 response only as a hint, and that the best practices for URL creation need to be adjusted to take this into account if a site wishes not to suffer from duplicate content.we need to adjust best practices with regards to URL creation or suffer duplicate content.
| 12:47 am on Dec 6, 2013 (gmt 0)|
according to fetch as googlebot your 301 response included 20 bytes of a gzipped text/html document.
what was that document?
| 6:23 pm on Dec 6, 2013 (gmt 0)|
I 301 my root to /forums.
Google shows the root URL in SERPS and has done since day 0.
| 6:28 pm on Dec 6, 2013 (gmt 0)|
I also have /forums in the canonical tag
| 10:07 pm on Dec 6, 2013 (gmt 0)|
They probably think-- rightly-- that there ought to be a root so they're ### well going to show one, whether you like it or not.
| 10:28 pm on Dec 6, 2013 (gmt 0)|
Dave, might this be common with this kind of redirect? Our site main page redirects to default.aspx and the SERP shows the URL without the default.aspx.
| 10:36 pm on Dec 6, 2013 (gmt 0)|
|I 301 my root to /forums. |
Google shows the root URL in SERPS and has done since day 0.
Redirecting a "home page" [domain root] has had the same effect for years, as far as I can remember -- The first time I noticed it was when redirecting the "home page" of a site to a /page on another site. The "home page URL" of the redirected site continued to show in the SERPs in Yahoo and Google.
I have not seen the preceding apply in most situations with a proper 301 and nothing else served via header or content when a redirected URL is requested, but I have seen it for years with "home page to inner page" redirects, so I'm not surprised by what Dave_Hybrid is seeing in the specific situation mentioned.
Some of the other cases I'm seeing reported do appear to indicate a change in the handling of 301 redirects on the part of Google, and whether the difference in handling is intentional or unintentional, imo, remains a question.
| 10:44 pm on Dec 6, 2013 (gmt 0)|
|I 301 my root to /forums. |
Google shows the root URL in SERPS and has done since day 0.
google has long treated the document root directory as a special case when considering technically improper redirects from requests for the root.
there are also cases described in these forums where a 302 redirect from a non-canonical url to the root is treated by google as if it were a 301.
while your 301 may be technically correct for your application, google sees it otherwise and is also therefore ignoring your link rel canonical suggestion.
the problem described by the OP concerns a subdirectory, so i would consider this as a distinct issue from yours.
| 10:53 pm on Dec 6, 2013 (gmt 0)|
|The first time I noticed it was when redirecting the "home page" of a site to a /page on another site. The "home page URL" of the redirected site continued to show in the SERPs in Yahoo and Google. |
it would be interesting to know if this was a subtle way to penalize/deprecate what google perceived as a "spammy" redirect.
| 11:01 pm on Dec 6, 2013 (gmt 0)|
I definitely agree, and I'm really not completely sure what to think, especially since the first time I noticed it the same effect was present on two different search engines at a time when Yahoo wasn't using Google's results -- So, did they both think it was spammy, even though the root redirected to an essentially the same page, or is there some other reason I haven't thought of for them to "essentially" not honor the redirect in place?
I haven't figured it out even after quite a bit of thought and finally decided to "just go with it" and understand redirecting a "root" to an "inner page" may have no effect at all on which "location" ranks, even with a proper 301 in place.
| 11:07 pm on Dec 6, 2013 (gmt 0)|
Yes, agree, home page was always treated a bit differently with redirects (although most I saw where the home page redirected to internal one were in fact 302 and not 301).
But technically well executed redirects of an internal page was up to now adhered to by Google.
|Some of the other cases I'm seeing reported do appear to indicate a change in the handling of 301 redirects on the part of Google, and whether the difference in handling is intentional or unintentional, imo, remains a question. |
Yes, I wonder this too. Because if this handling is new, avoiding these problems should be added to friendly URL best practices.
| 2:37 am on Dec 7, 2013 (gmt 0)|
|Our site main page redirects to default.aspx |
But why? Can't you just add "default.aspx" to the list of possible directory-index filenames? Nobody is ever going to type in "example.com/default.aspx" when aiming for the front page.
| 8:22 pm on Dec 9, 2013 (gmt 0)|
Lucy, not sure. This was done before my time. At least it's not like it used to be, as for a long time it was a 302 redirect. Now it's 301.
| 10:08 pm on Dec 9, 2013 (gmt 0)|
In this specific case, 302 may actually have been better, because then search engines will continue thinking of www.example.com/ as the "real" URL.
| 10:26 pm on Dec 9, 2013 (gmt 0)|
as lucy24 indicated 302 is more technically correct than a 301 in that situation.
however the best solution is to define default.aspx as the default directory index document and use a 301 to redirect all requests for default.aspx to the directory root (i.e. a trailing slash url)
| 11:13 pm on Dec 9, 2013 (gmt 0)|
Would doing that do more harm than good if almost half of our external homepage links already point to default.aspx? Most of which I will never, ever be able to get re-pointed to the directory root?
| 11:49 pm on Dec 9, 2013 (gmt 0)|
as long as your internal links are referring to canonical urls and you properly redirect all non-canonical requests to the canonical urls, you can't do any harm.
google is already actively ignoring your default directory index document, so you might as well send the signals google is assuming you intended to send if you could have or knew you should have.