Forum Moderators: open
I'm probably just being paranoid...but hoping I get a couple of backups on option 3...Thanks.
I would rather take the staging server completely offline than run the risk of being penalized by Google or inadvetantly telling Google to not index my site.. Hoping I don't have to do that.
You didn't mention anything about option #3. It is your opinion that this isn't a feasible option. If so, why?
[edited by: threecrans at 8:43 pm (utc) on Aug. 27, 2002]
If you do a 302 you will send yourself to the main server instead of seeing the staging server which would make it pointless.
True, it does make it kind of pointless. But there are advantages.
Also, I think I have a solution to allow it to still be internally viewable.
main server name --> "main.com"
staging server name --> "staging.main.com"
The site is an ASP site, so I check the Request.ServerVariables("SERVER_NAME") in the Session_OnStart event something like:
' Check to see if the web server is staging server
if instr(Request.ServerVariables("SERVER_NAME"), "staging.main") <> 0 then
' It is staging. Now to redirect to main
Response.Redirect("http://www.main.com" & Request.ServerVariables("PATH_INFO"))
end if
I should be able to view internally still by using the internal name which is not the same as Request.ServerVariables("SERVER_NAME"). ASP people, with this methodology is there anything I am not accounting for?
Perhaps you could get around the risk of that robots.txt being inadvertantly present on the live server by setting up a cron job to check on it?
And give up any chance of a decent nights sleep for the rest of my life? :)
It's living a little close to the edge...but not a bad idea.
So that's two votes against the 302 redirect. What is the reasoning though? What will Google (or any other Spider) not like about this?
You needed a user and pass to access the dev server.
our dev server was also setup with every domain as a sub of ours. So the live corp site was www.company1.com but the dev site was company1.ourdomain.com and required user and pass setup via .htaccess. Dev sites were never spidered and when I moved things to the live server it was only the server itself that was different to provide access.
just a thought.
Just change the Staging Server access rights, give yourself, and whoever else needs it a username and password. And your done. You can update your live server, no problem. And since googlebot doesnt have the username and password, it will not get in, and thus not index the site.
BTW: How does google know of the site?? Do you have links pointing to the staging server?? Or did you submit it to SE's??
jatar_k: You needed a user and pass to access the dev server.
That's a great idea. The only drawback is users will not be able to get to the main server...but at least it stops the indexing.
PaulPaul: BTW: How does google know of the site?? Do you have links pointing to the staging server?? Or did you submit it to SE's??
It used to host a portion of our site...which has since been moved to the main site. As a result there are quote a few links on the Internet pointing to the staging server
What about having a redirect from the old staging domain to the live site and applying a new sub domain to the actual staging server. You then keep all the traffic presently going to it and you get a clean staging server.
dev.main.com or some such, could be anything. Then you can set up the server to keep people out.
What about having a redirect from the old staging domain to the live site and applying a new sub domain to the actual staging server.
Another very good suggestion. This is probably what I will do if no better alternative arises.
Please! Can someone explain to me, why is a 404 or a DNS error better than a 302?
Put a robots.txt on the server blocking all spiders. Forgot to add, just change the publishing to not publish the robots.txt file so that you dont end up blocking on the main site. The only other option I can see would be to take the staging server offline and only have it local.
Configure your webserver to pull robots.txt file from a separate location.
So you'll have something like /home/zzz/robots.txt and all hosts will be configured to pass this file on all domains.
And your sites will still be in their own directories.
That way you won't copy robots.txt to the production server by mistake.
I like jatar_k's idea of a new subdomain. If this was my site, I would:
Put 301 permanent redirects on the current staging server domain to forward the existing-link traffic to the "published" domain.
Then create a new subdomain on the staging server, and move your in-development site content there. If you warn all development participants not to publish a link to the new subdomain content, then you won't have to have a different robots.txt or require username/password access, and that will obviate your worries about having an incorrect robots.txt published. It is very hard to keep domains and subdomains "secret" on the web, but if you don't publish links to the new staging server subdomain, and if you refrain from using Google's (and other SE's) toolbars in that subdomain, your new staging server subdomain should remain low-profile enough to be ignored by the Search engines.
Also, a new staging subdomain will allow you to place a robots.txt file on the old staging server domain in the future if you can get the majority of the high-traffic external links to the old staging server domain updated. Until then, you may not want to lose the link-pop and PageRank of those old incorrect links.
Until that time the 301 permanent redirects will prevent most search engines from listing your old staging server domain. AFAIK, 302 temporary redirects won't work for the SEs, only to redirect users. A 301 is required to get the SEs to drop the old staging server domain, even if there are still links to it on the web.
Jim
Ok, I went with the idea first put forth by jatar_k (new subdomain) and backed up by jdMorgan...also taking into account advice by Slade (what if new subdomain gets indexed?).
I changed the domain staging.main.com to point to a new server, one that doesn't have duplicate content. The server that was originally staging.main.com is completely inaccessable from the Internet (but still accessable internally). I created a custom 404 error page on the new server (the one that is now staging.main.com) with links pointing to appropriate pages on the main server. That way a 302 is never sent. Anyone see any potential problems here?
Nobody directly addressed the problem with sending a 302...so I'll put forth my own theory. Let me know what you think.
I have two servers with nearly identical content. Google thinks the one with a lower page rank (i.e. the staging server) is engaging in a less than wholesome tactic of duplicating page content, thus several PR0 penalties crop up on some of the pages of the staging server.
If I 302 redirect from staging server to main server, it appears my main server is now engaging in the spamming tactic as well, in which case the PR0 penalties could propogate to main.
This is based on absolutely no hard evidence. Please correct me if I am missing something.
I strongly recommend you use a 301-Moved Permanently redirect to avoid problems. A 301 effectively "tells" the search engine that you've moved the pages and are not attempting a duplicate-content exploit. The 301 causes the search engine to drop the old address (your staging server), update the URL it is using, and for Google, transfer the PR from the old URL to the new URL if it's higher. I believe that's what you want to accomplish.
A 302-Moved Temporarily will just make it keep what it's got, and check again the next time it spiders your domain, so your problem of having your staging server listed does not go away.
Just my opinion, though...
Jim
Finally, if your old URLs redirect to your new site using HTTP 301 (permanent) redirects, our crawler will know to use the new URL. Changes made in this way will take 6-8 weeks to be reflected in Google.
This to optimal way to handle it, your PR will be correctly merged, googlebot will be happy, etc, as GoogleGuy has confirmed in the past.
HTTP 302 redirects, on the other hand, may well result in the server being eliminated as a duplicate. And HTTP 404 responses will mean that you lose all the PR that would go otherwise flow to the main server via these links to the staging server.
I could intercept the request in the global.asa (Session_OnStart) to send a "301 Moved" or a "301 Moved Permanently" to any requests.
Does Google take into account the response status text when indexing, or does it simply lump all "301" responses into the same category. If it does take the response status text into account, which is better: "301 Error", "301 Moved", or "301 Moved Permanently"? (I'm guessing the Moved Permanently).
This topic - of IIS sending a "weird" text string with the 301 code - came up here recently. You might try a site search - I would, but I'm off to a meeting. AFAIK, the meaning of 301 is "Moved Permanently", no matter what text is sent with the code. This is defined by the first digit of the code - 2xx is "OK", 3xx is "moved" or "not modified", 4xx is "error" or "authorization required", etc. I suspect that the text is for human eyes only. But, I'm not sure if all clients will respond correctly to the 301 code and ignore the text.
Jim