Welcome to WebmasterWorld Guest from 184.108.40.206
Forum Moderators: open
The pages haven't been totally removed, instead it seems that the pages exist in the Google SERPS but they have no title and no snippet and therefore no longer appear for any searches.
At first I thought this was some sort of penalty / filter to remove some of the controversial search sites from its index, but it seems this applies to other large sites, e.g. dmoz. I would estimate that dmoz has had around 200,000 pages "nuked".
Has anyone noticed this phenomenon on any other sites?
I removed all links from a site recently (switched domains) and over a period of about 8 weeks the pages are being removed from google starting with the title and description (listed as similar pages). It's a 900 page site, so taking a fair amount of time for them all to be removed.
Since google is making it's updates on an ongoing basis it could be it's not finding the links to the specific pages that are losing there descriptions and so treating them as orphaned pages.
If you track this long enough you'll also find the descriptions do sometimes come back. this would be explained by a new round of spidering and this time the links to the pages were followed and hence the description.
Since deep content pages tend to have just one link to them from the same site it's not hard to see that Googlebot could miss multiple pages of a site and so not list it's description etc...
That does make sense and I agree that for my site the pages that appear in this manner are ones that have not been picked up from the recent deep(ish) crawl experienced on my sites this weekend.
The site in question is weak on backlinks (Other sites just as weak had a good crawl - oh well)
However, I have not seen this lack of title/description as wide spread as is currently being seen in the Google serps.
I used to think this too. However with more and more sites being effected, including good quality sites it is started to point more towards being a bug. :(.
"good quality sites" is a subjective opinion. If they're getting hammered by this penalty, they should put titles on their pages like they're supposed to.
Google broke a year ago. Read the thread about update Cassandra [webmasterworld.com]. Back then it was called "crawl lite." Now there are layers of analysis over exactly the same symtoms.
It broke a year ago, and is still broken. Get used to it.
as a developer i google alot, if all forums (such as this one) are missing 1 third of its pages from the serps then the chance of me finding a solution is decreased significantly. If i then jump to yahoo and do a search and find my solutions then i am going to eventually switch for good.
- site has PR 7
- has 16k backlinks according to ATW
- was in fresh crawl
- had 120k pages indexed, now "site:domain.com" brings 91k
- "site:domain.com kwdoneverypage" gives 132k
- about 70% of listings for "site:domain.com" bring URL only
- GB hits: mar 16k, feb 69k, jan 311k
- visits becoming less every day since feb 9th
- no changes except adding links and adding some more content into the cms
It's a site I've access to the stats but I don't SEO, so very natural. :)
One other thing is that if that single site you mention has lost nearly 300,000 visitors over the last two months where are they now going for their information? Would this be an alternative site or an alternative engine?
I am totally convinced that it is not a problem with the sites experiencing this bug and that it is a problem at Googles end.
I guess that the only way this will be fixed is to wait for the next crawl (However long that will be)
What confuses a lot of people, I think, is that it comes while Google have been changing their spidering patterns over the last nine months or so.
I think there's a difference between Google being broken and Google trying something new. From the results for affected sites that I've been sent, I get the impression that this is a tweak, not a bug.
Ciml, I really cannot see how you can come to this conclusion. Are you really dismissing the dropping of all of these long established, authoritative, non-spammy sites as the result of a "tweak"?
My original background was in electrical and electronics engineering and you can take it from me that if you tweak something too far or tweak it in the wrong direction you can break it.
I'm not sure that the many people who are being forced to struggle against this problem will be reassured by your submission. I think that this is a very serious problem that will probably attract far more activity on this thread. Watch this grow!
[edited by: BallochBD at 9:38 am (utc) on Mar. 16, 2004]
One other thing is that if that single site you mention has lost nearly 300,000 visitors over the last two months
The numbers you're referring to are just GB hits, nothing else. And a decrease by 90% is significant imho, especially on a PR 7 site that should have enough power to be deep-spidered quite frequently.
I personally think that they made a tweak that was not completely succesful. And I'm trying to understand what kind of tweak it was, as everybody affected. So let's stop discussing the broke/tweak thing and let's look which sites are caught by the filter.
Unfortunately, we can not post urls - I have plenty of examples.
Dmoz and WW are the more obvious ones but on these I can think of a reason why they might possible be listed in this manner. (Dmoz has obviously the problem of lots of mirrors (duplicate site penalty?)) WW has a structure that can cause mirror urls being linked to and listed twice and therefore duplicates.
Then there are the UK sites which link to their own serps to get exposure in Search Engines, some of these have been effected.
However, caught up in this are sites which have unique content, good backlinks etc.
We have about 4 sites constructed in the same way using very similar structures and only one of them has been hit.
Surely if this was a tweak the others would have suffered too?
The only difference between the site that has been hit and the others is that a large number of pages can only be found via the index page. This has never been a problem in the past and they still exist in the supplemental index.
I would guess that Google's defination of an orphaned page is a page with no internal and external links to it. This is a guess, and I would be interested in what the forum thinks the Google definition of a what an orphaned page is.
They were all built entirely by myself using similar techniques. I now notice that some of them have escaped. Others have had some pages suffer and some now only have the home page showing a title.
Perhaps we should be trying to find a common link between the sites that are being damaged. Or perhaps someone from GOOGLE could let us know what the h@ll is going on?
Pete, I wonder if you could clarify.
Is the site with the missing title/description the one with the index page links or is this the design of the sites which are still in the index.
Important distinction as obviously one sites pages are deeper than the others.
Sorry, I think I've been unclear. I meant that as well as the general changes in Google (fresher, but with more URL-only listings), there are additional sites that do not appear to fit that pattern.
For the orphan pages that HocusPocus mentions, this has been the case for a long time. Over the last nine months or so, there have been deep URLs uncrawled even though they have links and a little PageRank. Now, I can also see some high level pages with many backlinks and plenty of PR that are no longer fully-indexed. With 99% certainly, those would have been fully indexed in the past.
This is why the 'slow death' penalty idea does not seem daft to me, but let's be clear that there are other (probably more common) reasons for URL-only listings or non-fully crawled domains (just as PR0 is not normally due to the heavy-crosslinking penalty).
Hi Dayo_UK - the pages with the missing titles/descriptions are only accessible via a single page of the site.
All the other sites are structured slightly differently to this one, in the sense that you can find most of the pages from almost any page.
I am about to re-launch the site with a much smaller and more accessible structure under a new domain (I will of course kill the old domain) to see what effect it has. At the moment many of the positions have been destroyed so the way I see it there is nothing to lose.
Hopefully Google will let go of the original pages then and I can start again with the original domain. A real pain but I am bored of waiting now....
If I had deep pages that were not being crawled I would understand this a bit better but there should be no problem for the bots on this or, for that matter, any of my sites. Is anyone aware if there has been any comment from Google about this or any similar problems?
Does anyone know if there is a chance that my site will recover or will I have to get a new domain name and relaunch? This would cause me a lot of pain because my site has been there for two and a half years and it has repeat visitors and links from many other sites.
The chances are the existing site could be fine, but the whole situation has made me take a better look at the site and how it is structured.
The reason for the new domain is I can't get Google to drop all the old pages.... of which there are thousands.
Is Google simply expiring old index data before it's had a chance to recrawl the page?
That might explain the difference between your sites with the no description/no title pages being deeper and therefore having "crawl lite"
>>>Is Google simply expiring old index data before it's had a chance to recrawl the page?
That would fit in with what I am seeing. Mmmm - I wonder.