Forum Moderators: open
Because of this glitch I've been watching log files on both old and new and waiting until all humans and spiders stopped going to the old.
Googlebot found the new site on the 7th and deep crawled several times this past week. Same for several other spiders.
However, googlebot is still going back to the old site and pulling a few files which are the ones that usually get a fresh tag. This happened as late as yesterday.
Inktomi seems really confused because it found the new IP on the 7th and has pulled robots.txt almost every day since - but only robots.txt! Every day it goes to the old IP and pulls robots.txt and has been deep crawling files. Fast is doing the same thing.
Should I remove all files from the old host now or wait until all humans, googlebot and other spiders stop going to the old IP? I think my google penalty was finally lifted last update, and after 11 months of being in googlelimboworld, I don't want another penalty.
A site was moved (not a domain site, though) on July 21st. Google got it right away the first time and all links, even those pointng to the old were treated as though going to the new. Inktomi's got it, FAST has it, Teoma doesn't and Alta Vista doesn't, with the new URL - even with the 301.
Do you mean put the new IP in the htaccess? and, if so, is this how to do that?
RedirectPermanent /mydomain.com/ [IPnumber...]
Your problem appears to be purely one of DNS caching. Just leave the old site (IP address) live long enough to get deep-spidered by the SE's that are important to you. Then if you are in a hurry, you can take down the old site.
If leaving it up for a couple of months is not a burden, then doing so would be good insurance against a search engine spider glitch that could cause problems. Waiting to be sure that you suffer no losses in the SERPs before shutting down the old IP is the most prudent approach, IMO.
<rhetorical>This DNS caching issue sure causes a lot of problems... The SE spiders request robots.txt quite often, so what's the big deal about getting a fresh DNS translation more than once a month?</rhetorical>
Jim
Some threads here suggest that an old site should be taken down as soon as spiders find the new one to avoid duplicate content penalties and other threads suggest that it is a good idea to leave the old site up for at least a month or so - as you suggest.
Since googlebot has deep crawled my site at the new host I am concerned why she is still going back to the old IP, too.
Is it a DNS cache issue that Slurp only grabs the robots.txt at the new IP but keeps deep crawling the old IP? Same thing happending with the Fast spider.
no expert here... but i wouldn't think google would screw people like that. I think only duplicate content found on multiple domains will only trigger the google's alarm. Since you are only dealing with one domain here, Google will always see the same content under one domain, as directed by its DNS server.
However, always make sure you have base herf on each page, if you don't hardcode your domain url in each href tag.
[webmasterworld.com...]
I used to have base href on my pages but I forgot why that was a good idea. Can you explain or is it better to have each url absolute?
I'm paranoid here about getting another penalty from google so don't know whether I should go add the base href to the pages on the old host too?
Thanks
The slurp problem is likely just a manifestation of the fact that most search engines use multiple computers, and each of those computers does not always know what the others are doing. So, the one grabbing robots.txt repeatedly without grabbing anything else on the new server is using a more updated DNS than the one that continues to spider your old server, and they just haven't compared notes recently.
Moving sites to new servers is a normal occurance on the web. I agree with irock that Google and the other SEs have an interest in supporting this activity without "extra penalties". A duplicate content penalty by IP address just doesn't make sense, and therefore I trust that they wouldn't waste the time to develop an algorithm to try to detect or enforce it.
I can't say what the absolute "right thing to do" is. But it just makes sense to wait for all your correspondents to note your new address before you stop picking up mail at your old post office.
If some of the second-tier search engines haven't caught on within a few weeks, you may just want to cut and run, based on how important those engines are to your site's traffic, and how much work is involved maintaining two servers. Only you can decide which ones are the most important, based on your raw traffic, conversions, or whatever controls your bottom line.
From your description, it sounds to me like you've done it right, which can't be said about some of the situations in the thread that Marcia cited above. And as a final bit of encouragement, remember that GoogleGuy said that their policy is not to penalize sites for issues outside the webmaster's control.
Jim
Having absolute URL would solve everything. It's usually what people do anyway. However, if you don't use absolute URL and base href, and some crazy guys link to your site by your IP address, then the domain name part will be replaced with IP in the eyes of spider and your visitors.
If you use absolute URL, then you don't have to worry. If not, use base href tag which points all relative URL to the given domain.
Trust me. I tripped into ALL mines Google has to offer. I don't SPAM search engines, but I had EXTENSIVE encounters triggering their peanlties ACCIDENTALLY.
Like what my dad said, do NOTHING and your site will be fine. Try to do something weird, THINK TWICE.