Forum Moderators: open
After the September 2002 update, the site dropped from the index, except for one internal page. All pages had no PageRank (greyed out) except for the one page still in the index which had a 0 PageRank. When I did an allinurl search for the site in Google, the one internal page came up but without a page title or description... just the page URL.
A couple days after the full update, 36 pages out of several hundred showed up during the Fresh update. If I used the Google toolbar to check the cached view of the pages, I got the message "Your search... did not match any documents." which indicates to me that they were not in the regular index. It was like this, pages index just by Freshbot, until February 18th when the "Fresh" pages no longer showed up.
Now there are only 5 page URLs that show up in a allinurl search on the site. There are no titles or descriptions except for the home page which is showing the DMOZ info and the current PageRank is 4.
So in summary, the site has dropped out of Goggle after about 1 1/2 yrs and Google says it is not being penalized (response to a reinclusion request). It hasn't be in the full index for the last 6 months, but every month Googlebot is at the site grabbing pages.
Does anyone have any clue as to why this might be happening. Any suggestions would be greatly appreciated!
In msg#8 of this recent thread [webmasterworld.com] GG wrote
In practice, there's lots of reasons that Google might not have the content of the page. There could be a robots.txt file, or the server could have been down, or we might have seen references to that page but not crawled it, or there could have been redirects, meta tags, etc. Personally, I think it's actually one of the strengths of Google that you can do a search like "Colorado virtual library" and we can return something like the first result. It turns out that www.aclin.org forbids all spiders, but Google is still able to pull descriptions from the Open Directory, for example.
That confirms that Google will use the information from the ODP if no other information is available in the full index.
In msg#10 of the same thread he wrote about the same site
Maybe we saw references to it; it could have really good PageRank
And that is also what happened in your case where the homepage has a PR4. So I'm afraid getting more links won't help in this case. Maybe you could try to get a link to a subpage instead of the homepage.
You would say that robots.txt is OK if Freshbot can spider 36 pages. But still you could check the robots.txt. If there is some syntax error in it, freshbot might handle it differently than deepbot. That is the only suggestion I have.