Forum Moderators: open
GoogleGuy - any thoughts or suggestions?
Most posts on this topic seem inconclusive - other than the fact that just about every type of redirect except a 301 will cause problems. Can anyone also shed more light on the worst case scenario for using a 302 or other type of redirect for otherwise legitimate purposes?
Would love to hear the latest wisdom on this from WW's resident guru's. :-)
caveman
caveman
I'm trying to think of how you would do that and I can't think of a way that would make more sense then several other much easier load balancing methods, so I'd love to understand how this is setup.
Do you have one web server that 302's requests round-robin to other web servers using an SSI, or what?
Since from what you describe, each visitor hits a dynamic page to start with, then is redirected to the "home" page on another server, everyone has to get at least one answer from the primary server. In that case, perhaps you should think about justing serving up the home page from the primary server, but re-writing the links dynamically to lead them to "other" servers when they follow them? That may save a little overhead, but you'd have to consider the effect that spidering would get by having the links constantly changing. Can't see how it'd be worse than it is now, though.
A better solution would be to use DNS round-robin with a low TTL so that you can drop a server that goes down. That would have every server show up as the same domain name/URL, but the IP address would be different. That would make it so that web clients don't hit the same server for the first page and don't get a 302 or 301 or anything, they just get the pages from one of your servers.
An even better solution would be to get a port on a real load balancer. If you are spending the money to host multiple servers and have so much traffic that you need to load balance it, then you can probably afford to use a port on a "shared" load balancer provided by your ISP. If it's a really big site, you could also buy your own. I'd suggest a Foundry ServerIronXL, in that case. It can also double as nice switch.
If you need to load balance due to traffic levels, you should have better options available to you than the method you describe.
So I put in 302’s to see what would happen. Gbot, maybe a ummm a different version, found the 302’s and fixed their database so all was fine and well up until FL. Then I noticed that once again Gbot wasn’t getting the link page even though it already had the correct page for the redirect. (red flag #2) So I was right back in the same boat, but I had not changed anything. Now, I have gone from any of the top, used loosely, 100 pages. Not only that, but after Gbot read robots.txt, which it and no other SE has had a problem with, it touched one of the pages I disallow. It did not read any bytes, but it touched it.
I’m now going back and changing the 302’s back to 301’s except where I had a large number of redirects because this is the only reason I can think of that my rankings may have dropped so dramatically. Where I have the large number of redirects, silly me had article01, article02, article03, etc. Changed the names of them to something I could manage, but after that changed the directory where they are. This is such a cluster-funk, that I am removing all the redirects for all the articles until I figure out what SE have what dir/article in their databases. I’ll add back in 301’s as I see they are needed. Looksmart just came through and found the article01, article02 stuff, so I know some legacy stuff exists in some of the SE’s databases. But I bet I get tagged for 404’s then. It’s the way my luck runs. I did a whole bunch of 301’s and 302’s just when there were some discrepancies with at least the way one SE bot handled them.
As far as I can tell, to add to Brett’s rules, above all else, really think hard about the structure of your web site to include the file names so you won’t have to go through a bunch of work two or three times and then if there are any, umm, bugs in any of the SE’s, with redirects, you will not have to worry about it.
Example, if I had the courage to redo the site again, I would have most of the web site inside the public_html folder in a folder called public so that I could control access better via .htaccess.
Well that’s my experience with it for what it is worth.
We eventually 301 redirected all the internal pages to new file names that were more logical, etc. The 301's worked...kind of. It took G *six months* to get it all properly updated. That was not a fun six months...and I've read a lot of similar posts in here. Some get lucky and see it all resolve sooner, but not always.
This site has literally vanished from Google Index!
202.156.2.xx - - [02/Nov/2003:19:54:42 +0530] "GET /vtsamodehaveli HTTP/1.1" 301 340 "http://www.myxyzsite.com/hotels-tour.htm" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)"
PS - I replaced the exact URL and exact Ip due to TOS of Forum.
A 301 is for pages that are permanently moved to another location
A 302 is for pages that are temporarily moved to another location
So, if the purpose is load balancing, a 302 would be the right one to use given these two alternatives. I'd strongly suggest, though, that you work on sending status codes in the 200 range (page found, etc.) in stead. This can eg. be done by using an internal redirect in stead of an external one (just omit the [R] flag in the rewrite rule).
There's additional info here: Engelschall's Apache URL rewrite guide [engelschall.com] (the guy who invented mod_rewrite)
The normal and intended use for 30X status codes is pages or websites that move from one URL/URI to another:
a) If you redirect "x.com" to "y.com" using a 301 redirect, Google will eventually merge the two domains in the SERPS so that only "y.com" will remain. It will take at least a couple of weeks for these changes to propagate to the SERPS.
b) OTOH, if you use a 302 redirect both domains will be kept alive in the SERPS with separate listings, but Google will bury "x.com" so deep in the SERPS that it will only show up when you search for it - "y.com" will still be treated as the proper domain. (given that you have only a few incoming links to "x.com" relative to links to "y.com". If not, "x.com" might be interpreted as the right domain in stead of "y.com".) If you have no incoming links to "x.com", then only "y.com" will remain in the SERPS.
It's the same thing for pages.
Conventional wisdom suggests always using 301's (and Google does so on their webmaster pages as well), but this is not right, as those two codes do mean different things and should be used for different purposes. These are my personal rules-of-thumb for Google:
If used properly, neither of the two status codes should get you into duplicate trouble by themselves. They simply tell user agents that this or that page has moved. So, what the user agent (eg. googlebot) will see is not a copy of the same page at another location, but the exact same page, only transferred to another URL. It's not a copy, it's the real thing ("whatever it was that used to be here is now found there").
So, essentially, a 30X is a "placeholder" or "shortcut", and not a real page. That's also the reason that Google can merge this kind of URL's in the serps. If you use them right, you will avoid creating duplicates, but the .htaccess setup will not be the only thing to consider, you have to take incoming links into account too.
301: You redirect more than one page to the same new URL. If these pages all have incoming links, Google will have a hard time figuring out what the real URL is, as your website will appear to have a split personality. Effects are described in this thread: redesigns, redirects, & google -- oh my! [webmasterworld.com] - or in any of the "missing index page" threads (imho).
302: You redirect one or more page(s) to the same new URL. If these pages all have incoming links, Google will not be able to merge them (it is a temporary redirect). You will risk that it will be considered duplicate content even though it isn't. Most likely the page that you are redirecting from will be buried deep down in the serps. Theoretically, if it had a lot of backlinks it could even outrank the real page, and the real page would be buried in stead, but that's not real: as backlinks are "inherited" or transferred this will not happen.
I would not expect Google to post "official" comments on this, although i would really welcome it. These matters might have been open for abuse at some points in time, but i personally feel that the worst you can do with them at this moment is to harm yourself.
/claus