Forum Moderators: open
The pages haven't been totally removed, instead it seems that the pages exist in the Google SERPS but they have no title and no snippet and therefore no longer appear for any searches.
At first I thought this was some sort of penalty / filter to remove some of the controversial search sites from its index, but it seems this applies to other large sites, e.g. dmoz. I would estimate that dmoz has had around 200,000 pages "nuked".
Has anyone noticed this phenomenon on any other sites?
I disagree. I observe the behaviour now even with a very new site. Launched around March 20, all 90 deep pages indexed 2 weeks later (no top rankings though). This morning 90% of the pages don't have a snippet / title - url-only. Referers dropped for the few phrases that had good rankings. This site is clean - nothing to filter, nothing to penalize.
I wouldn't call it a Google Problem but just the way google behaves currently. Recrawling allready known pages - Googlebot seems to remove them from the index first and add them back in later. I'm sure, my pages will show up again shortly.
I'd say no filter, no penalty.
[edited by: Marcia at 5:23 pm (utc) on April 27, 2004]
I don't believe this to be related to the index capacity. Why should my index page get PR5, and why should my index-site be removed from the serps when the problem is capacity. These are 2 things which have nothing to do with each other. If they have a capacity problem, I would think that they remove a certain percentage from the sub-pages of every URL. But why should they stop counting backlinks or decrease PR?
greg
none of these sites are mines and most of them have set up links to my sites but not all.
They are all Quality Sites related to the topic of my site. Some of them are regional offers, some of them have offers I can't give my users on my site.
Furthermore there are only 40 links to other pages. I know plenty of sites which have more links on their index pages.
greg
I uploaded a new site about five or six weeks ago. This has only 15 pages. It appeared after a few days with great results then almost immediately disappeared. Yesterday when I checked, the only page that had both title and cache was the home page. Today all the pages are once more back in but it is still not ranking.
I also had the problem with an 80 page site. This has also now recovered its PR, page titles and descriptions but it is also not ranking?!?
It would certainly make me laugh if this turned out to be the cause of the problem.
Kaled.
PS I added about a dozen pages to my site about a month ago. They were duly indexed and dropped to url-only listings about ten days ago. My site is a tiddler so I can honestly say that size doesn't matter.
it's not a program compile problem though related. it is a an index problem also limited to 32 bits. to overcome this, google seems to have several buckets (db):
1 - main or primary index that contains the fully indexed pages (limited by 2**32). this means that for every new page that gets in, another has to be dropped. google has to add new pages otherwise it becomes obvious that google has this capacity problem.
2 - supplemental index (to relieve the main index capacity problem) that google refers to only if too few results are found from the main index
3 - the "url-only" bucket that google "never" got around to indexing (because they're out of index space!). note that the "url-only" pages seem to be dynamic from update to update. i would say google simply randomly decides which pages to leave out. this bucket is used by google only for site: or url: queries.
note that all the google problems that have been reported recently can be explained by this model. It's a totally random process which has us all scratching our heads!
does anybody have any evidence that the total number of pages in the main index (i.e. not including supplementals and url only) is over 2**32?
since i cannot reproduce your query, could you please check if it includes supplementals or "url" only pages.
i've done regular queries before where google unpredictably also includes supplementals or "url" only pages. i say the total numbers shown by google is unreliable.
i still maintain that all the weird google behaviour we're observing are attributable to this.
i have safesearch off and i still get the following:
"Results 1 - 10 of about 28,400,000 English pages for the [definition]. (0.14 seconds)"
is there a way of specifying safesearch off through the query instead of through the preferences?
i also tried advanced search with filtering off and i get the following:
"Results 1 - 10 of about 29,100,000 English pages for the [definition]. (0.16 seconds)"
note the different totals.
i know it to a be fact that the serps can, but not necessarily, include supplementals and/or "url only" entries. this means that for serps > 1000, you cannot conclude that the serps do not include supplementals and/or url-only entries. so we still cannot disprove that google's main index is limited to 2^32!