|How to Handle Cname URL|
Indexed in Google
I have Cname created for my domain and now the cname urls are indexed in google which sud not be.. I want to remove all those cname.mydomain.com/index.jsp from the google indexed.. how do i use the robot expression for all engines.
Disallow: [cname.mydomain.com...] is not working in robot.
thanks in advance.
I was thinking of telling you the way to get www.example.com out of SERPs in favor of example.com but it seems you could do with help with your robots.txt file instead.
In the file root of the web service of 'cname.mydomain.com' you place a text file named 'robots.txt' with the following written inside:
That'll tell honest bots like Googlebot not to bother with any URL off cname.mydomain.com that starts with 'index.jsp', you should make sure that content on the site doesn't link to 'cname.mydomain.com' at all, though that's really for a more permanent fix.
I can't help but think that robots.txt is a touch weak for what you really want and it's not practical in the usual server setup like www.example.com and example.com usually are.
I'd search on the term '301 permanent' if I was you. Drop the apostrophes when you put it in the search box. Then try it with the addition of which webserver you are using, ie. like, '301 permanent apache' (again about losing the apostrophes.).
Please don't do that!
This problem can not be solved using "robots.txt" alone. No way.
A CNAME is a DNS alias for a domain so that on the internet, eg the domains
"www.foo.com" and "www.bar.com"
- will point to the same single document root on a server. So, if you place a "robots.txt" file in the root of "www.foo.com" you will automatically place that same file in the root of your other domain! Can you see how this is harmful? You will remove not only one, but *all* your domains from the search engines.
There are two different ways to solve the problem:
(1) Remove the CNAMEs at you DNS service provider
(2) Use a "301 Redirect" to make sure all users (including SEs) end up at one domain only; the right one.
And I say right back, don't entirely rush into that claus!
Now that searchbots like Googlebot know to look for cname it's messy to remove name service for it - 301 redirecting www.example.com to example.com (or t'other way, doesn't matter too much) is the neat and proper way to do this, but experienced asked about robots.txt in a robots.txt forum - I reckon my answer is quite on track for it really.
Ps. robots.txt is no way to do this at all really, is what I was hinting with "search for on term '301 redirect'" among other bits.
No, your answer is potentially dangerous to the original poster!
No way he should put a disallow all "robots.txt" file in his main domain root and have all his web content removed from the SEs
>> searchbots like Googlebot know to look for cname
Please, don't even go there. You're opening yet another can of worms.
This is an affiliate site and we are using cname for the user point of view so that he/she sud not see the other site name in the url which boost his/her confidence when buying from site.. So possibly we would not like to remove the cname. it is pointing to like myaffiliate.myaffiliatedomain.com to mycname.mydomain.com... I serve the content through cname to my visitors. I had search on google for similar sites, and they are also listed in the same manner like cname and all.. but i really dont want my cname in engines.....
claus - you seem to be reading the first few words of each of my posts and firing some sort of blind broadside at me, how does that help? This is the 'robots.txt' forum, o/p asked about robots.txt mate, I thought it was pretty nice of me to say that it's a weak method and here are some search terms in my original reply.
Have you got two (or more) unique domain names pointed at a single root folder of a web service? If so, and you want one site being seen on one URL and another site (or section) seen on another URL then that will require some scripting which is possible in .htaccess, .jsp and lots of other places per server setup but never robots.txt
robots.txt is a file which is read off your server in plain text by (*)'web crawlers' and (*)honest ones will obey it as best they can interpret what you've written there, it is pointless trying to make a URL with your domain name in it there because knowing your domain name was how the crawler found the robots.txt file in the first place - aside from Major SEs, the crawlers are pretty simple and the Major SEs give out info on what their crawler can cope with in your robots.txt
mmh, there's a way to get apache to process robots.txt as PHP script and so be able to make a dynamic robots.txt of it, then just look at the "HTTP_HOST" and 'echo' out the desired version of robots.txt for that domain. Same thing can be done in IIS by changing the '404 not found' response in server admin to a file.asp which should then see if the document requested was robots.txt, serve appropriately for named "HTTP_HOST" or spit out the normal old '404' document if not asked for 'robots.txt'.
I can show a working example in IIS, I expect the one I left in an Apache setup ages ago is gone and I didn't keep a copy of the relevant files. But my boss charges fairly stiffly for my services anyway and I probably shouldn't show anyone 'exactly how to do it' in either for free!