How about placing a simple link from y to x. Accompanied by a message explainng that the site has been moved/deleted/whatever, so whenever a user sees the site they follow the link. Then block all bots by using robots.txt
Then email all folks that link to site y and them to change the link to site x, that way google will eventually stop being referred to x, and if it does get there it wont be able to index it.
I hope that helps :) its a really simple way but i find when keeping things simple it often works better.
301 redirect Y to X
Thanks for that.
Mike - A simple link is in place from every page.
All links point to X.com yet Y.com was being indexed?
G was ignoring our robots.txt.
Coco - We tried that during the summer but reading on WW, it seems G was having bad problems re permanent 301 redirects.
I've used a 301 redirect to move a site a couple of times, one in the last month, and once about 3 months ago. Both times, the old site was dropped and the PR transfered without a problem. Perhaps you could give it another try?
If you use a 301, do you use a RedirectMatch with a regexp for all requests (RedirectMatch permanent /.* ...)? Or do you just redirect "/" and thus let all other pages issue 404s?
Just wondering whether it could be a problem for Google.
I'm not sure of that dirkz.
I'll ask the techie & see what I find out.
Thanks all for your help.
We have a very similar problem with a slightly different setup: We had two different virtual domains serving the same pages and we "split" them 6 weeks ago (i.e. some content stayed on y.com and some moved to x.com, with 301 redirects from y.com to x.com.
applies in our case as well
|Doing a G "site search" for X.com displays all y.com URLs. |
Both sites are obviously regarded as the same thing by G
Unfortunately, we have not yet figured out why this is happening and what to do about it. G has just now started visiting *one* page on x.com (which has very good inbound links), but apart from that only the server root is visited by googlebot.
A 301-Moved Permanently [w3.org] response is the way to go. RFC 2616 [w3.org] describes this server response and what it means. If you have problems with it, then either it's not implemented correctly (check it [webmasterworld.com]), or you haven't given the search engines enough time to pick it up. There have been rare cases when a search engine does not correctly handle robots.txt, but that does not mean you should not do it correctly, and let them fix their problem (You could do a temporary work-around if you were able to infer the specific problem and come up with a work-around for it).
Similarly, if Google is not interpreting your robots.txt file correctly, then it's likely your robots.txt syntax [robotstxt.org] is incorrect, so check it [searchengineworld.com]. I've seen minor problems with other robots misinterpreting robots.txt, but Google's parser is one of the most sophisticated ones.
Do things "by the book" for best overall results and minimized headaches. Then apply work-arounds when and if necessary. Check your work and be patient -- it can take 60 days before some search engines catch on, and even longer for others. :o
|only the server root is visited by googlebot |
This looks like the behaviour Googlebot shows with fresh sites, which just goes for some time once the site is deepcrawled. So to me all looks good, though it will take more time.