| This 46 message thread spans 2 pages: 46 (  2 ) > > || |
|301 Redirect Means "Some Loss of PageRank" - says Mr Cutts|
For quite a while now, I've been cautious about using 301 redirects instead of fixing the core issue - whether it's getting legacy backlinks changed or fixing website infrastructure problems. This advice was based on several things - avoiding chains of redirects, for example, or introducing a potential trust issue because 301s have been a spam tool so often. But mostly, it seems to me that some PR is lost in a 301.
Eric Enge just published a new interview with Matt Cutts that confirms this idea.
|Eric Enge: Let's say you move from one domain to another and you write yourself a nice little statement that basically instructs the search engine and, any user agent on how to remap from one domain to the other. In a scenario like this, is there some loss in PageRank that can take place simply because the user who originally implemented a link to the site didn't link to it on the new domain? |
Matt Cutts: That's a good question, and I am not 100 percent sure about the answer. I can certainly see how there could be some loss of PageRank. I am not 100 percent sure whether the crawling and indexing team has implemented that sort of natural PageRank decay, so I will have to go and check on that specific case. (Note: in a follow on email, Matt confirmed that this is in fact the case. There is some loss of PR through a 301). [my emphasis]
I wouldn't say that means never use a 301 - it is one of the useful tools in our toolkit. But it does mean don't throw 301 redirects around like confetti. Do get legacy backlinks changed when a domain changes. And it's better to fix a server infrastructure issue directly, whenever you can, instead of just doing a patch job.
Matt's further comments underscored how a 301 is useful for migrating to another domain (what other tool do you have, anyway?).
He also mentioned migrating within a site, for example if you change your CMS. Now there's a case where I prefer not to use a 301, but to retain the old URLs if at all possible. But sometimes the old URL scheme is so awful that it's important to change things. Always an informed trade off, isn't it.
|whether it's getting legacy backlinks changed or fixing website infrastructure problems |
In January, I signed up for WMT, and discovered numerous 404s on Googlebot requests for index.html (example.com/index.html). Since WMT shows you the referring url leading to crawl errors, I examined those backlinks. NONE of them were linking to me using /index.html. EVERY backlink in the list linked correctly to my canonical. This lead me to set up rules to 301 redirect all requests for index.html to the canonical '/'. My first concern is why Googlebot was requesting index.html to being with, when the backlink did not exist using that format. My second concern now is that there may be a loss of PageRank from those links because of the 301 redirects. Googlebot should have been "seeing" the links correctly to begin with.
I've also recently dealt with a site where there were supposedly incoming external links pointing to /index.html but on inspection they were all found to point to / instead.
I am not sure if that is indicative of some other problem or not.
I use redirects quite a bit. One recent case was for a site where the internal links all pointed to non-www URLs, but all incoming external links pointed to the www version of those URLs.
Google had already listed mostly non-www URLs for the pages of the site, but had listed several www URLs for some of the content. Most importantly, the root index page was listed as ONLY a www URL.
All of the internal links were then changed to point to www URLs, and a redirect was also installed from non-www to www. It took less than two weeks for Google to add the www URLs to the index, and a number of non-www URLs remain in the SERPs some three months later.
So, in this case, redirects are an important part of the fix. However, I do see a large number of sites where poor redirect implementation adds to the problems rather than fixing them all.
The number one problem we see over in the Apache forum is the use of a URL-to-URL redirect when an internal URL-to-filepath rewrite could be used. This latter method is implicit in Tedster's comment about "migrating within a site, for example if you change your CMS. Now there's a case where I prefer not to use a 301, but to retain the old URLs if at all possible."
It is important to remember that URLs and filepaths are not at all the same thing, that they are associated but not equivalent "addressing methods," and that this association is a function of the server itself. Because of this, you could re-arrange and rename every single directory and file on your server, but retain the old URLs (even if this is an extreme hypothetical case). Nevertheless, we see dozens of Webmasters posting that they've changed their server file structure and therefore need to (301) redirect all the old URLs, when this is not in fact the case: All that is required is to internally rewrite the old URLs to the new filepaths.
It is interesting that Google de-values redirected URLs. Hopefully, this is not true (or the devaluation is only slight) when the purpose of the redirect is to cure a problem for which no other solution is available.
I'd like to throw some perspective onto the PageRank " 301 PR Decay " confirmation from MC with another clairification request .
That is the frequently communicated " 301 redirect tangle " , where multiple redirects have been applied to such an extent that a site can get lost from the SERP's for potentially many years or maybe forever.
I kind of see this the way duplicate content filtering would deceive webmasters into think a penalty existed , when all it was only a filter at work.
Could this " 301 redirect tangle " be related to the loss of PR [ trust / authority / link juice ] that goes with normal indexing ?
|I kind of see this the way duplicate content filtering would deceive webmasters into think a penalty existed , when all it was only a filter at work. |
It might not even be a filter, but rather the number of redirects they set their bot to follow... I'm working on one right now and will probably only follow one or two redirects at the most so if there are 3 in place I wouldn't know what links actually point to the final landing page, because it would be disconnected from the original since the number of followed redirects is limited.
It could be they follow 2, then request the second redirected to location and then follow 2 more and so on (or something to that effect) and then try to piece together what goes where, but I would personally try to get people to the correct location with 1 or 2 redirects at the most rather than relying on an association being made and not having the link weight severely discounted, if counted at all.
How and what to follow how far is actually one of the questions I was trying to figure out the answer to today, so right now it's only a theory and something I'm trying to work through myself for a 'niche specific' bot.
What they do is probably a way more complicated than I'm making it sound (what I'm doing is), but I could see where there may be a 'disconnect' of links to landing page if there are too many redirects in place, or why you would want to disconnect the links to one page from the final landing page if they run through too many redirects... It's one of those things that's a bit tough to explain and easier to understand if you sit down and try to do it.
Here's one way to look at it:
You get the info from Page A, which has a link to Page B and it's fairly easy to store and associate the fact A links to B, but then you visit Page B and it redirects to Page C, so you have to change the association of links somehow from A links to B and say A links to C, then you open up Page C and find it redirects to Page D, so you have to redo the association again to say A links to D somehow, and you either have to drop the redirected pages out of the middle, or find a way to associate the links through the redirects and it turns into a really complicated question as far as storage, association and access are concerned, especially when you start asking yourself what if Page B no longer redirects to C, but rather to F?
It's not a simple as a browser redirecting, because a browser simply takes you there and if the redirects change between your visits it doesn't really matter your browser takes you to the new location instead, but it doesn't work that way in a database...
[edited by: TheMadScientist at 2:45 am (utc) on Mar 15, 2010]
I've been fortunate that I haven't had to use a ton of 301s. But similar to crobb305 and g1smd, I had incoming external links (valid ones from other websites) pointing to /index.html. I 301 all requests for /index.html to /.
|But similar to crobb305 and g1smd, I had incoming external links (valid ones from other websites) pointing to /index.html. I 301 all requests for /index.html to /. |
You say that those incoming links were valid and pointed to index.html. Can you clarify? Are those inbound links incorrectly linking to your homepage as "index.html" or do they link to the the canonical '/' as you intended? If webmasters are incorrectly linking to your homepage, I'd contact them and ask for a change. Otherwise, your problem sounds identical to mine whereby the inbounds are correctly linking to my canonical, but Googlebot is apparently trying to crawl /index.html (and subsequently reported 404s in the Webmastertools, prior to my 301).
I can't find a single instance where another page/site has incorrectly linked to my homepage with /index.html (I don't even use .html for any of my pages), so I have no idea why those requests were being made by Gbot. The 301 has stopped the 404s, but at what cost?
|It's not a simple as a browser redirecting, because a browser simply takes you there and if the redirects change between your visits it doesn't really matter |
In your example (where clicking a link on page A redirected you to URL B then to C and then to D), if URL B was changed to redirect to F, from that time on clicking the link on A would take you to page F (via URL B). However if, later on, the redirect status of URL C or D was changed, the browser would never know that. The question is, do bots revisit the 'intermediate' URLs in a chain after the head of a chain is altered to redirect someplace else?
|But similar to crobb305 and g1smd, I had incoming external links (valid ones from other websites) pointing to /index.html. |
No. WebmasterTools said there were incoming external links pointing to /index.html but on inspection the links were found to be pointing to / instead. In any case I always install a redirect rule that redirects a selection of common index page names to root to 'catch' incoming linking errors. That rule was quickly added to this site, and we'll see how long it takes for WebmasterTools to start reporting the correct linking.
|The question is, do bots revisit the 'intermediate' URLs in a chain after the head of a chain is altered to redirect someplace else? |
I would guess they do, and I've been working on this for most of the evening and trying to make a decision on what to do for my 'mini-bot', but my goals are not the same as Google's, so I'll probably do some things different here, but what I would guess they do (or the theory I would work with as a starting point) would be:
Follow all the redirects through to the final location, but keep track of the number and diminish the value of inbound links by N% (could fluctuate based on number and circumstances) for every redirect along the way, because IMO when you are trying to organize the worlds information and return a 'trusted', important resource, and I ask myself if a highly valued, trusted resource would be moved through more than a couple of redirects I come up with, no...
I can see one: www.example.com to example.com, or 'basically, it was here, but now it's over there', or maybe even 2 in some cases, because quite a few people don't know what they're doing, like redirecting from example.com/index.html to www.example.com/index.html and then from www.example.com/index.html to www.example.com, but when I ask why you would redirect me through 3 or more (especially across domain names) to get to the actual content I can't come up with a very good answer.
My guess is (keep in mind I'm guessing, so it's not 'tested' or 'fact') there could be a 'flexible dampening' in place with redirects as with most things they do, so it might not be 'as dampening' on link weight passed to redirect from example.com to www.example.com and then from www.example.com/page.html to www.example.com/new-page.php (it doesn't look to sneaky or manipulative) as it would be to redirect from example.com to keyword-example.com to keyword2example.com.
Edited: Made a correction to the multiple redirect example noted by g1smd.
[edited by: TheMadScientist at 8:18 am (utc) on Mar 15, 2010]
One thing about searchengine bots. They don't 'follow' links or redirects.
A user of a browser clicks a link, the browser requests the URL, the server sends a redirect status code and URL to the browser, the browser requests that new URL, the server responds, and so on.
A bot scans a page and adds any URLs it finds in links to its 'to be crawled' database. Minutes, hours, or days later it requests those URLs one at a time. For each URL requested, it either stores a page of content into the database to be scanned later, or it receives back both a redirect code and new URL and stores that URL in its database to be crawled later. So, there's lots of additional work to tie a chain of redirects together and 'count' the number in any chain.
[edited by: g1smd at 8:25 am (utc) on Mar 15, 2010]
The more I think about how they could store and associate redirects, the more the 'ranking lag' seems to make sense, because they would have to get from the spidering of the page (noticing the redirect) to updating the association (in the data the use for calculations) and finally to the re-calculation of the inbound link effect (PageRank calculation)... It would probably take some time to get all the way through the process, especially on a larger site.
Imagine just an 80 page domain (Site A) being moved to a different domain (Site B) and how it could happen on the ranking calculation side of the process though just a single redirect...
Site A is moved to Site B.
Site A pages 1 through 10 are Spidered and the 'inbound link' data is updated and sent to the calculation process.
Site A loses the weight from the top 10 pages during the process.
(Site A loses it's most important inbound link weight causing it to be lowered in the rankings.)
Site B gains the weight from the top 10 pages during the process, but only has 1/8th the content.
(Site B only has the link weight from the 10 pages previously on Site A and has very limited internal links and 'deep links' in the calculation process.)
Site A pages 11 through 40 are spidered and the 'inbound link' data is updated and sent to the calculation process.
Site A loses the weight from the next 30 pages.
(Now half of the pages and most of the inbound links are in the calculation process for Site B. Site A has very little 'link credibility', but still retains half the content.)
Site B gains the weight from the next 30 pages.
(Now Site B has most of the inbound links being associated to it in the calculation process, but only half the pages of Site A.)
The final 40 pages are spidered and the 'inbound link' data is updated and sent to the calculation process.
Site B finally replaces Site A in the rankings.
It's actually a bit more complicated when you think about internal links and other possible changes which may impact rankings and I could see where it would take some time to get all the way through the re-calculation and ranking process, some of which must depend on spidering frequency of pages.
The more redirects present the more complicated the process gets... Personally, I do wonder sometimes if they request redirect locations immediately? I think I would, but not follow links on the page personally, or that's my approach right now anyway.
Edited: Terminology... Technically, I wouldn't 'follow' redirects, but would rather store the location being redirected to and request that location before moving on to the next URL in the que. (I think, right now today.) So technically I would not 'follow' the redirect, but would insert the location of the redirect into the crawl before moving on to the next location. It's a bit of a technical difference in the process, but still a difference.
It's disappointing that Matt Cutts has stated this as a "rule" instead of accepting this as an area where Google's algorithm needs rework.
What is the point in specifying to bots and users that the redirect is a "permanent" redirect if Google still takes links pointing to the old URL at face value.
There should be some sort of mechanism by which Google differentiates incoming links to example.com before and after a 301 redirect has been initiated and is able to transfer the value of all links earned before a 301 redirect to the new domain.
So if I have .example.com redirect serverwide to it's www.example.com I am loosing PR?
[edited by: tedster at 3:40 pm (utc) on Mar 15, 2010]
[edit reason] switch to example.com - it can never be owned [/edit]
But do register both
www.example.com as separate items in Google WebmasterTools and look at both reports, especially the 'crawl errors' and the 'internal links' and the 'sites that link to you' reports.
Try to find incoming external links pointing at the non-canonical version and over time work on getting those other sites to adjust their link.
Additionally, for any sites where their link to you includes an index page filename such as index.html or index.php or default.asp get them to remove that filename from their link, ending the URL with just the trailing slash. That is their link to
www.example.com/folder/index.php becomes a link to
www.example.com/folder/ and so on.
I was wondering that too. The question and quote from MC above refers specifically to redirecting one domain to another domain.
Nothing is said about redirecting pages, folders or hosts within a domain.
I'd consider it stupid for Google to bleed PR through redirects that take place within a domain, but then I also consider it stupid to bleed PR via a 301 full stop (which as someone else has pointed out means 'this document has moved permanently' - not 'this document has moved permanently so it should be trusted less'). So who knows?
It's impossible for us to know G's algorithm, but the most straightforward line of logical thought would indicate that if 301 redirects now lose some fraction of the original URL's PR, then yes, you'd lose a little PR redirecting example.com to www.example.com.
But you'd be losing that fraction of PR instead of losing all of the PR that was previously assigned to the "wrong domain." And further, you would still be taking the remaining PR (that part which is not lost due to Google's newly-announced discounting) away from that "wrong domain" and giving it to the "right domain," thereby still making it less likely that that "wrong domain" would compete with your "right domain" in the SERPs.
Just a personal opinion, but I think it's obviously still well worth doing. And Google continues to recommend doing it.
The bottom line on all of this is that it's best to run a tight ship, so that *zero* external redirects are required for your site to function, to never change your domain name unless forced to by litigation, to ensure (by design and testing) that from the moment it first goes live, your site can be accessed by one and only one variant of your domain name (e.g. www- or non-www, not both), and to never change your URLs, even if you change your site's underlying file structure or technology.
This change only re-emphasizes the value of planning ahead and of designing your site's URL-architecture, rather than just assigning URLs in a cavalier manner, as many Webmasters do.
|So if I have .examplw.com redirect serverwide to it's www.example.com I am loosing PR? |
According to this, yes. But that's still better than a canonical/duplicate site issue.
I've had the unfortunate experience of having to implement thousands of 301 redirects (mostly Apache rewrites) across a single site after a platform change. From my experience, yes, the 301s did result in the pages losing value and rankings. It took almost exactly 1 year for all of the link value to return back to the site which resulted in rankings and traffic returning. I really would not recommend having to do this.
The best practice here is to try to avoid rewriting URLs if at all possible when changing platforms/CMS systems. Spend the extra coin or take the time to ensure that your URLs are the same. The only exception here, is if you are going from really bad URLs to much better URL structures. Only then may it be worth it.
[edited by: tedster at 3:42 pm (utc) on Mar 15, 2010]
[edit reason] switch to example.com in the quote [/edit]
|The best practice here is to try to avoid rewriting URLs if at all possible when changing platforms/CMS systems. |
No. As jdmorgan said above, a rewrite would be a much better thing to use, instead of a redirect. With the rewrite in place, you would continue to use the same URLs 'out on the web' to access the content, even though the internals on the server are completely different.
I have a site that continues to use .asp URLs even though the site migrated to Apache hosting and PHP scripting many years ago.
I have only used 301's mostly to show Bing the correct page. The MSN bot often comes looking for pages long gone (2007-2008) due to a succession of several rename happy previous webmasters.
When I took over this one site I left the naming as is, adding the 301's to address 404 errors only. The backlinks were notified and most changed to the current site page, a very small percentage of link sources never responded or changed.
FWIW I do have example.com wild card redirected to www.example.com
|No. As jdmorgan said above, a rewrite would be a much better thing to use, instead of a redirect. With the rewrite in place, you would continue to use the same URLs 'out on the web' to access the content, even though the internals on the server are completely different. |
You are correct, I meant to say "redirecting" instead of "rewriting". Typo on my part.
|You say that those incoming links were valid and pointed to index.html. Can you clarify? Are those inbound links incorrectly linking to your homepage as "index.html" or do they link to the the canonical '/' as you intended? |
Sorry for not clarifying. My situation doesn't involve sites linking correctly to '/', but WMT showing 404s for "index.html".
I had external incoming links to "index.html"--which were valid. Meaning, hundreds of external incoming links were pointing to: www.example.com/index.html. Since some of those site owners are unresponsive to removing the "index.html," I simply 301 "/index.html" to '/'.
I don't think I've seen a situation where an external incoming link points to the canonical, but WMT shows 404s for "index.html." Not sure if that's an issue with Googlebot or not.
I am following this discussion with interest because I am in process of changing from ugly dynamic URL structure to "friendly URL" structure. The site has around 3000 pages and we will change probably 300 of them in total, leaving the product data having ugly dynamic URL structure.
Started slowly changing selected pages about 5 weeks ago and so far only one page dropped ranking for a few places, the others retained or improved.
I have to say that one of reasons for change is to be able to target specific countries using folder structure (the site runs in number of languages), and therefore it could easily be that any loss of ranking has been compensated by advantage gained through geo-targeting.
We have also changed few english language pages (english language is not geo-targeted)but have seen only one drop of ranking, other pages have retained it (so far).
It perhaps helps that pages being changed have very few inbound links and internal linking structure has been changed to point to changed URLs?
Make sure that your old canonicalisation rules are modified to point incorrect old URL requests directly to the correct new URL, instead of having those rules send old incorrect requests to the old versions of the correct URLs and then redirect again onwards to the new correct URL.
As an example, you might have had a canonicalisation rule that redirected
www.example.com/?id=123. If your new URL is
www.example.com/page123 make sure that both
www.example.com/?id=123 requests now redirect directly to the new URL, not from A to B to C in a chain.
You can tell by the fact that Matt couldn't immediately answer Eric Enge's question that this is not a major issue - but it is still somewhat of an issue, I'd say. In fact, we should note that Matt was talking specifically about cross-domain redirects, and not necessarily about redirects within the same domain.
The best practice has always been as g1smd said above, and that's what many here have recommended for years. The major take-away is this -- don't be casual with 301 redirects. They are not html, they are server side actions and they shouldn't be treated lightly.
Thanks g1smd, yes, we do redirect in "one hit", i.e. there is no chain redirects :-)
Correct me if I'm wrong but Matt and Eric are not talking about on site 301 > 200 scenarios. The transfer of PageRankô is typically 1:1 when on site 301 > 200 at play. Aren't they're talking about a chain that may occur between old TLD and new TLD and existing links pointing to old TLD which may produce the 301 > 301 > 200 scenario. Oooh, goosebumps. ;)
As I read it, the context was not even about a chain of redirects:
|the user who originally implemented a link to the site didn't link to it on the new domain |
Matt began by mentioning a "page by page migration" and Eric came back with "you write yourself a nice little statement that basically instructs the search engine and, any user agent on how to remap from one domain to the other."
Does PageRank from inbound links transfer via 301 only within a domain, or can it also transfer to a new domain? The reason I ask is based on advice in the -50 thread wherein several members advised others hit with this filter to 301 their entire site to a new domain.
| This 46 message thread spans 2 pages: 46 (  2 ) > > |