Forum Moderators: open
If you're site is not also on the old IP adress you may be out of the index for a month or two. Google updating is historically sporadic with DNS.
Sorry to be the bearer of bad tidings...
Nick
<edited to correct speeling mistaks>
[edited by: Nick_W at 4:35 pm (utc) on Feb. 8, 2003]
maybe it's just me, but that's annoying.
i'm working on my own search engine, that works similar to google on links, but balances the act with old school search engines like altavista. it will only run on one category of sites (like a directory, one search engine for say...automotive sites, which is what i'll first do since i maintain several) because i can't afford the space/bandwidth. already with almost no available resources the spider i've coded has no problem with session ID tags (it finds them and ignores them), and crawls during off-peak hours of every day of the month. this not only keeps it updated, it doesn't bombard servers with 50 instances of googlebot at peak hours, overloading it (which is what happened to me). and as far as DNS goes, every joe schmoe with a dial-up has access to updated DNS records within about a day. i'm sorry to hear Google does not. maybe they're not making enough money off adwords or something?
that said, the new 'players' in the search engine world are all pretty stupid. you don't need a gimmick. google is nowhere near perfect, and perfect is nowhere near unachievable. with hard work, a lot of resources, you can make a simple search engine that will grow in popularity simply on merit. because it works damn well. the money and sponsors/advertisers come later after you've got a loyal base of internet users/searchers who will keep coming back.
there is no success like that success which is deserved.
[/rant]
Google finds a site by following urls but once it finds a new site it stores the sites IP. From that point on, it locates your website via it's IP address.
If you move to a new IP address Google will more often than not continue crawling the old IP untill it updates it's DNS database.
There was a DNS update a few days ago.
There are lot's of discussions on this in the archives, try the site search fo find more info on Google and IP addreses.
Nick
Could it be a DNS issue if they are hitting on the old IP address?
My ISP is really hopeless when holding on to a IP address is concern. They virtually reset the IPs of all host every one or two weeks and they like to choose Sundays. But so far it has not prevented google from reaching us. They would fail to locate us only if it happens while they are crawling midway. That is as far as I know.
kwngian
If Google simply requested sites by IP address, then it could not index sites that are "virtually" hosted.
A server that provides web-hosting that has one IP address has a "default" site, and one or more "virtual hosts." The HTTP protocol passes along the site name and the IP address when requesting a page, because passing the IP address alone would be an ambiguous request. So this is not how the HTTP protocol was designed.
Let's say a particular site, www.example.com, is hosted (along with 50 other sites) at 192.168.0.1. If I request "www.example.com" in my browser, that page will be displayed. If I request "192.168.0.1" in my browser, it will show me the default site. If www.example.com happens to be the default site (out of 51), then it coincidentally would be displayed. If not (and this would be 50 out of 51), then it would not be displayed.
Because of this, requesting a page by IP address does not work.
I could see the possibility (one that I didn't think of when writing the previous message) that Google cache's the IP addresses, and uses them in combination with the names when requesting a page. This would have the small advantage of saving a tiny amount of bandwidth by not having to do a DNS lookup; and the large disadvantage of making it difficult or impossible to port a website to a new server.
Once I my machine reboots while one of my friend was browsing thru, and his machine reboots too..don't know why.
Actually when my IP change, it gets updated like within 5 minutes. All other new connection will get thru except googlebot who will not appear until the next time freshbot comes around. Maybe like you say, they cache their DNS. However lately it seems to update faster or was it just my imagination.
kwngian
Google does cache the IP address, and it is not a "small advantage" to them. A true, non-cached DNS lookup is at least 2 separate requests, usually more. Three billion pages can generate a hell of a lot of DNS lookups. Often a DNS lookup can take more time than downloading the actual page.
And as you note, ALL pages are referenced by their IP addresses in HTTP. But they are also referenced by domain name if one is given. If google keeps the old DNS cache around for 3 months, then they will reference it by the old IP address along with the domain name.
There are over three billion pages indexed, but these are not three billion seperate DNS lookups. A website with 1,000 pages, when crawled, would require one DNS lookup (which may be two or three requests), and 1,000 page requests.
I think I'll stick with my statement that making it difficult to change IP addresses is not worth the small advantage of saving a few extra DNS requests :-).
The information should be cached for an hour, not multiple months.
It's an established fact and oft time repeated instruction that the best way to handle an IP change is to leave the old site up and delete it when google stops crawling it and picks up the new IP.
Google crawled my old IP's for 6wks before thier last DNS update a few days ago.
Nick
What that means - for people who are 'non techie' like myself, is that if you change your server's IP address, Google will still come around to the old IP because that is what they have stored in their DNS cache.
One way to fix it is to make sure that you old host deletes your entry from their local zone file -> because if they don't, Google will still get 'some kind' of response, I believe. Even then it will take until the next time Google refreshes their DNS cache to have your site's info there.
Try doing a site search [searchengineworld.com] there are many, many people who have reported this problem with Google. Use the keyword for the search 'dns /forum3/'.
Happy searching.
Of course Google, like just about every other PC on the internet, cache's DNS information.
A DNS master file on a name server looks something like this:
@ IN SOA domain.com. hostermaster.domain.com.
{
100 ; serial
7200 ; refresh
3600 ; retry
604800 ; expire
21600 ; minimum
.
.
}
And then it goes on from there. Notice the forth parameter down, "expire".. which indicates to the client (like your PC) how long to cache this particular domain's DNS information. This is in seconds, so basically the above indicates 12 hours. After 12 hours, the DNS cache expires, and it is re-requested next time it's needed. Apparently, Google does not play by the rules when it comes to DNS caching :-).
If your old web-hosting company deletes your DNS record, and Google requests your site, they are going to get a different site hosted by your old web-hosting company, and you wouldn't want someone else's content being indexed in place of your own.
If your IP address changes, and the new IP is dedicated to you, then you can setup a 301 redirect to the IP address from your old web-hosting company. Unless they break any of the IETF's other rules, this should work without a problem.
If your new IP is being shared with other websites as a virtual host, I'm open to suggestions, because redirecting to an IP won't work.
Caching DNS information is fine, but for hours, not weeks or months. The resources that are saved by caching the information for such a long period of time are very small.
There are hundreds of DNS changes every day, and caching this information for an excessively long period of time makes it difficult for these sites with new DNS information to be reached.
Irregardless of how it *should* be done, it's obvious that Google does something different, yes? So...it's helpful to know how to deal with the issue at hand, instead of simply saying, "Google should do it different."
I agree, it does make it hard for sites that change their DNS info -> it's just something I've come to expect from Google, however. They still do it better than any other search engine out there -> else you might not be missing their traffic.
Cheers.
It actually says to expire the entry after one week not 12 hours.
I think everyone here agrees with you that it would be nice if they updated their cache according the schedule set by your DNS server, but they don't.
Having your old host delete your DNS entry will make no difference in google's ability to crawl your old site, as long as apache still has the entry for that virtual host domain name. Google will send the http request to the old IP, including the domain name. Apache will behave the same way that it always did.
Using a 301 redirect is not what you really want to do as you will have to do a redirect to an IP address instead of to a domain name. Good luck running a ecommerce site that shows up in google as an IP address.
Maintaining a copy of your site on the old server until google stops crawling it is the safest solution.
A major performance stress is DNS lookup...
The paper goes on to mention that each crawler maintains its own DNS cache. It is believed that Google has added a central DNS cache since then (though I wonder if the deepbot and Freshbot each have their own).
GoogleGuy mentioned (about a year ago I think) that their DNS retries would become more frequent. There do seem to be less problems with long DNS caching nowadays.
kwngian , you might be lucky like ChrisXenon and have Google refresh their DNS for your domain at just the right time, but it's better to take the advice of NickW and BigDave; keep the site working for a while on the old IP and you'll be safe.
Oops.. you're right, 604800 /60 / 60 = 168 hours :-).
<Having your old host delete your DNS entry will make no difference in google's ability to crawl your old site, as long as apache still has the entry for that virtual host domain name.>
I was making the assumption that if your old web-hosting company deleted your DNS record that their standard procedure would be to thoroughly remove everything of yours off of their system, including the web server entries. Having run my own servers for a long time, I haven't hosted with anyone else, I don't know their standard procedures, so I may be wrong of course :-).
The 301 redirect is not the best way to go.. but I don't see any issues directly related to eCommerce. If your old web-hosting company will remove your DNS record but maintain your Apache configuration, that would be great!
<Maintaining a copy of your site on the old server until google stops crawling it is the safest solution.>
That is the best way to go, assuming everyone will cooperate with you :-).
<A major performance stress is DNS lookup...>
It can be, if not done correctly.
When I go to www.domain.com in my web browser, it performs a DNS lookup (which is usually a couple of requests), then that information is cached according to the DNS record. Let's say the DNS record tells me to cache that information for 12 hours. For the next 12 hours, when I go to www.domain.com, or *any* page hosted at www.domain.com, my system does not do a DNS lookup.
There is one lookup per domain, not per page.
When Google crawls a site.. let's say it has 100 pages (or 1,000 pages, or 10,000 pages, or whatever), there is one DNS lookup performed.
If a crawler tries to crawl the web and doesn't cache any information, it's going to put a major drain on the crawler DNS lookups take 20% of its resources. If it does one DNS lookup and caches it for the length specified in the DNS record, it can crawl a good portion of the site without having to make another DNS request. At this point, DNS requests would make up <1% of the crawler's total resource usage.
Again, I understand that Google caches this information for a long time.. and of course I will play by their rules because their traffic is important to me. I'm just fantasizing about a more ideal circumstance :-).