| 4:54 am on Jan 28, 2004 (gmt 0)|
Glad you found out the answer--I remember being stumped by this one. I'll mention this to a crawl person the next time I run into one, so we can check if there's anything we can do at our end to make that work better if anyone else is in that situation. Think you can hunt down links to the other two threads? In case I can point a crawl person here, it would help if they can see everything from one page..
|Mr Bo Jangles|
| 4:58 am on Jan 28, 2004 (gmt 0)|
I see that at Googleplex they're known as 'crawl person' and 'crawl people' and not 'crawlers' - for obvious reasons *_*
| 5:09 am on Jan 28, 2004 (gmt 0)|
wow, cool post Crow_Song!
Your method might have just answered my friends question! I'll re-direct him to this thread..
| 5:12 am on Jan 28, 2004 (gmt 0)|
What would one do if it was impossible to re-create the old site due to moving from the old isp to a new one, etc? I think I could be having this affect me also, but am not able to do what you were able to do [resurrect old site]. I bet many sites over time have this situation too.
| 8:47 am on Jan 28, 2004 (gmt 0)|
Someone once suggested a page to me on the Google site that would request for indexed pages to be forcibly removed from the index (you had to be the owner, of course). I can't find that link now but you could possibly leverage that to remove your home page thus restarting the entire process on the next crawl.
| 9:20 am on Jan 28, 2004 (gmt 0)|
>> Think you can hunt down links to the other two threads?
Google refuses to spider site. It has been more than a year! - Google hits the index page and goes no further. [webmasterworld.com]
Google thinks old server = new server! Google is messed up... [webmasterworld.com]
| 11:44 am on Jan 28, 2004 (gmt 0)|
I'll mention this to a crawl person the next time I run into one, so we can check if there's anything we can do at our end to make that work better if anyone else is in that situation.
Tell them to use DNS the way it was designed
to be used. It has been quite obvious for some
a/ the spiders rely too much on using ip addresses
stored in the indexes rather than hostnames
b/ the interpretation of ANAME and CNAME records
Yes, I harp on this every chance I get in the
hope that someone is actually going to do
something about it. Especially point B.
| 5:53 am on Jan 29, 2004 (gmt 0)|
How to put "a good robots page in place telling Google that the old server was dead"?
put /disallow at robot.txt?
| 4:25 pm on Jan 29, 2004 (gmt 0)|
Johnlim: that's exactly what I did. For a month or so after reviving the DNS name, I just observed the traffic on the box. Then I put redirects in place to the other (real) server. This didn't actually change the fact that Google was confused, so I put a robots file up that disallowed everything. It was almost exactly two months later that suddenly, one day, Google began spidering the real server like crazy. It hit the site about 90 times the first day, and 400 times the next! After that, we were golden. A couple of weeks later, pages began showing in the index.
| 7:48 am on Jan 30, 2004 (gmt 0)|
Now I move one site to new ISP (IP are changed, DNS also changed)
Then need I put /disallow at robot.txt at the old IP (ISP)?
| 8:21 am on Jan 30, 2004 (gmt 0)|
Hi wonder if I have a similar problem.
I moved my (information) site from a personal hosting server to its own domain on a new server a year ago. I didn't have access to the old server's robots.txt file so I used META redirects and links. When I understood that might have caused me to incorrectly trigger a duplicate content penalty, I just removed the old site. Problem is, the old server does not properly serve 404 pages, just a generic 200 redirect "this page does not exist".
Google does spider the new site, but PR is 0 and no backlink show (there are more than 200). Needless to say, the site is perfectly clean, no SEO tricks (I see long term and focus on quality content). It's been about a year now.
| 9:34 am on Jan 30, 2004 (gmt 0)|
>> Problem is, the old server does not properly serve 404 pages, just a generic 200 redirect "this page does not exist".
For Google it's also possible to put a 'robots.txt' on a lower level then the root.
Source: Remove Content from Google's Index [google.com]
|If you do not have access to the root level of your server, you may place a robots.txt file at the same level as the files you want to remove. Doing this and submitting via the automatic URL removal system will cause a temporary, 90 day removal of your site from the Google index. (Keeping the robots.txt file at the same level would require you to return to the URL removal system every 90 days to reissue the removal.) |
| 11:11 am on Jan 30, 2004 (gmt 0)|
Yes, I did just that some months ago, but it did not seem to help. I still seem to have a penalty and I don't know why (no answer from Google to my mails). So my best guess is about this mess of changing server.
| 11:15 am on Jan 30, 2004 (gmt 0)|
Thanks for the post, flagged it for future references.
This whole place is full of great advice!
| 12:03 pm on Jan 30, 2004 (gmt 0)|
Wow, this is a symptom I have been puzzling over for a month or so. Fantastic post!
Problem is, for me the "old IP address" is out of my control, because it was bought from a domain company.
Is there any other way round this? I have tried contacting the domain company, to no avail so far.
Therefore I cannot close this back door. Or can I? Do we have to go to google with this?
| 1:45 pm on Jan 30, 2004 (gmt 0)|
Can anyone take a guess why this doesn't happen consistently?
I just moved some sites to a different IP and DNS and I seem to have no problem with googlebot.
Great thread. It's one for the library.
| 2:03 pm on Jan 30, 2004 (gmt 0)|
problem with dns config, not g
| 2:28 pm on Jan 30, 2004 (gmt 0)|
Ulkari: if the old server is still serving pages (even page not found errors) I think you may end up in the same boat I did. Google may still think the server exists, and continue to hit it.
For me, the only way I could figure out what was going on was to resurrect the old server and watch the logs. I couldn't believe it when - a year after the server had been taken offline - Google crawled all over it within minutes of putting it back up.
Part of the problem in our case may have been in the change of names: we changed from old.server.name.ca to server.name.ca. The old.server.name.ca was actually a node of the zone server.name.ca, and as far as Google was concerned, the node answered for the zone. The zone then disappeared and the new site took the name server.name.ca - but it was now a node and not a zone. Google must have thought that server.name.ca was still a zone, and continued to try to contact old.server, which used to answer for the zone.
I'm sure I just explained this horribly! In a nutshell, I agree with plumsauce about Google's interpretation of ANAME records.
| 7:43 pm on Jan 30, 2004 (gmt 0)|
Thanks Crow_Song for sharing what you learned. I too believe I'm having a similar problem as your did, but unfortunately the old server is still serving dummy pages without error code, and I have not control over it. I cannot even have a look at its logs (it was a basic hosting service bundled with dialup Internet connection offer).
In my case though, DNS has nothing to do with the problem, since the new server is in a different domain.
Google is not eternal, and since I have no short-term commercial pressure, I prefer to focus on building quality content for the users I get via links or other SEs, rather than trying too hard to understand and work around the beast's mistakes.
| 10:36 pm on Jan 30, 2004 (gmt 0)|
As far as i recall, Google had pages indexed from the old server. These ghost pages showed up in the SERPS as "headline = url only" and clicking them gave a 404 error.
Now, this is a systematic error, and it should be looked upon.
Google has had problems with "URLs that are in the index but can't be validated for some reason" - for a long time. In this thread [webmasterworld.com] Yidaki made me aware of previous threads on the subject:
1) Indexed AlltheWeb pages causing Google duplicates - Aug 14, 2003 [webmasterworld.com]
2) click.fastsearch.com shows instead of my url? - Oct 8, 2002 [webmasterworld.com]
This might not appear to be the exact same situation, but from a "spider viewpoint" it is the same - some URLs are indexed and the spider is not able to go back and validate them, as the linking page is not spiderable. What happens then, is that these "Ghost URLs" remain in the index, and in some cases this leads to de-indexing of the "real" sites (the ones being linked to) somehow - aka. "slow death".
In this thread from december 2003 (msg #37) [webmasterworld.com] i dubbed it a "302 Google bug" for lack of better words. As shown by Crow_Song in this thread, it's more general than just 302 redirect links.
So, Gbot must be told to ignore (ie. forget about) links and domains that it can not spider (or aren't allowed to spider). These types of data should be removed from the index. Erroneous links and domains should not be allowed to corrupt other data.
| 11:20 pm on Jan 30, 2004 (gmt 0)|
Crow_Song, It must have been a hard time for you then and I am glad that you have shared all this info here that may help many of us from strange sometimes situations like these.
Many a times, when I face a problem I serach at webmasterworld and get some nice discussions which not only upgrade my knowledge and but also lets you solve problems fast. Thanks for your post. I just Bookmarked your thread.