Big sites suffering no title / no snippet in SERPS

Forum Moderators: open

Message Too Old, No Replies

Big sites suffering no title / no snippet in SERPS

Is google penalising big sites?

cyberax

12:15 pm on Mar 14, 2004 (gmt 0)

As of around a week ago it seems that a lot of big sites, particularly some UK search sites, have lost a lot of their indexed pages in Google.

The pages haven't been totally removed, instead it seems that the pages exist in the Google SERPS but they have no title and no snippet and therefore no longer appear for any searches.

At first I thought this was some sort of penalty / filter to remove some of the controversial search sites from its index, but it seems this applies to other large sites, e.g. dmoz. I would estimate that dmoz has had around 200,000 pages "nuked".

Has anyone noticed this phenomenon on any other sites?

zgb999

2:40 pm on May 19, 2004 (gmt 0)

Exactly

No title / no snippet in my eyes just means that Google knows about the page but is not considering it at the present time to be in SERPs. There are many reasons for it
- Google is not able to crawl the page anymore (or was not able to crawl it)
- page has content that is very similar to other content
- technical problem of any kind
...

renee

4:34 pm on May 19, 2004 (gmt 0)

google just runs out of space in the online index. this is the reason it drops or not fully index good sites! wake up people. if you're hit, it's just a random occurence. it's not that google does not like you.

eggerda

4:37 pm on May 19, 2004 (gmt 0)

Hello,

There must be a valid reason why this happens. It isn't random - that would be ridiculous. There must be some factor that stops google from spidering these pages and not indexing them.

I think it is a combination of needing a higher PR value overall throughout the site (more inbound links) and possibly reducing the similarity of the pages throughout the site.

Dan

kaled

5:09 pm on May 19, 2004 (gmt 0)

Well, if you're looking for reasons, here's a theory.

Perhaps Google has a duff robot running amok (either hardware or software fault). Each robot, presumably has a list of urls to visit. All that would be required to create this effect is for one of those robots to report (incorrectly) that the pages are offline.

Just a thought.

Kaled.

wanna_learn

5:19 pm on May 19, 2004 (gmt 0)

The suffering sites are majorly from Travel Industry?
Just curious to find out....

renee

6:17 pm on May 19, 2004 (gmt 0)

no! all sites are affected. do a site search for a large enough site (just a matter of probability), and you will find url-only entries.

this is a lot more plausible than some complicated theories of filters.

in fact let's do a survey:

your total number of pages
total number of pages in the online serp
total number of url-only pages

renee

7:24 pm on May 19, 2004 (gmt 0)

here are a few cases for some large sites.

site (fully indexed / url-only)
msn.com (1,580,000 / 1,830,000)
yahoo.com (5,460,000 / 3,290,000)
cnn.com (501,000 / 207,000)
amazon.com (2,590,000 / 2,350,000)

and we complain about a few url-only pages!

Yidaki

8:08 pm on May 19, 2004 (gmt 0)

>pages that I put a robots noindex meta tag on
>Google refuses to completely forget about them.

That's a totally different issue. Google has never and will never completely forget about them. Nor about disallowed pages either. But they will never show in the results for standard queries. They are just NOT INDEXED. And they never will. That's what the meta tag says: DO NOT INDEX ME. That's totally different from the observations of url-only listings for pages that should normally be indexed.

>google just runs out of space in the online index.

Yawn ... [webmasterworld.com]

>Perhaps Google has a duff robot running amok
>report (incorrectly) that the pages are offline.

Amazing. Where do you get those funny ideas?

renee

9:00 pm on May 19, 2004 (gmt 0)

yidaki,

are you acknowledging or denying that google is running out of space index space? this is a lot more plausible than some of the funny theories you've been posting. let's see some of your proof. how do you explain the stats above on msn, yahoo, cnn, amazon? i can show you more if you want.

don't yawn and be intellectually lazy. think!

SlowMove

9:12 pm on May 19, 2004 (gmt 0)

They're running out of space in the SERPS. Max is 100 results per page:)

kaled

10:57 pm on May 19, 2004 (gmt 0)

Amazing. Where do you get those funny ideas?

Daft as it seems (and I was not being serious) it would explain the problem.

Clearly, this phenomenon is either by accident or by design (or perhaps the result of problem management, e.g. if some index capacity had to be taken offline, but that doesn't seem likely).

If by design, then I for one am at a loss to understand the logic.

If by accident (i.e. a bug) a faulty robot is plausible except for one factor - it ought to have been spotted and fixed within 24 hours.

I suppose there is another possibility. Perhaps someone has let a virus loose that is blocking Googlebot. Again, not likely and, in any case, Google should have spotted it immediately.

Kaled.

yowza

11:42 pm on May 19, 2004 (gmt 0)

I just got a reply from Google on this problem. They say that the reason that this happens is <snip>

Mod note:

There can be absolutely no quotes from emails posted at the board. Paraphrased, it indicated that they are pages included in the index but not fully crawled by robots and only partially indexed.

[edited by: Marcia at 4:03 am (utc) on May 21, 2004]

SlowMove

11:43 pm on May 19, 2004 (gmt 0)

Interesting. I'd like to see that.

flex55

8:04 am on May 20, 2004 (gmt 0)

yowza, can you stickey me?

cyberax

10:36 am on May 20, 2004 (gmt 0)

Yowza, please sticky me too.

Thanks,

Paul

zgb999

10:47 am on May 20, 2004 (gmt 0)

Many people got the same reply months ago. So they are still working on it. But their reply doesn't mean that your ranking will be any better after they finished what they are working on for months.

wanna_learn

11:10 am on May 20, 2004 (gmt 0)

Yowza

LOL, thats a canned response I got 7 months back while my Site had Slow Death, after having No Title/No Snippet problem.

There was a huge Thread over that.

Abigail

2:55 pm on May 20, 2004 (gmt 0)

Don't try and tell me that sites that have been in the index, fully displayed for up to 4 years fall into this category - come on! It would just be a lot less stressful to KNOW the reason why. I can't find a commonality with anything I am reading here, except lack of title and desc snippet.

yowza

3:54 pm on May 20, 2004 (gmt 0)

I'm not saying that the response is the explanation that we are all looking for. In fact, it isn't the response that I was hoping for. I don't know why my site went from "fully-indexed" to "partially-indexed". In fact, I don't know why they have partially-indexed sites. In the SERPS, if I have the choice between a site that has a Title and Description or a simple link, I will choose the Title and Description every time.

It could be a convenient explanation for a big problem or an intentional way to keep their index large while cutting costs.

Abigail

4:08 pm on May 20, 2004 (gmt 0)

"It could be a convenient explanation for a big problem or an intentional way to keep their index large while cutting costs."

I agree - one or the other. It would be nice to have a bit of a "fess up" by Google on the issue.

wanna_learn

6:33 pm on May 20, 2004 (gmt 0)

Google SURELY dont have an answer to this problem for sure, they have good chance to save people from confusions and panic at WW Board! (hope they avail it soon)

renee

7:01 pm on May 20, 2004 (gmt 0)

i know it is hard to accept a simple explanation and it is more fun to indulge in filters, penalties and conspiracies. but all this google weird results can simply be explained that google is running into some kind of constraint in its online index.

this is evidenced by google creating a separate supplemental index,url-only entries in the serps and mysteriously disappearing pages.

can the constraint be time as implied by GG and the canned replies as well as the google faqs? hardly, since this can easily be solved by simply keeping the old info of existing pages.

is it a space constraint? hardly, since memory and disk is cheap.

is it a docid/index problem? GG has vehemently denied this in one of the posts.

whatever the constraint is, the problem must be insidious and very difficult to solve particularly that this has been happening for several months now.

GG - this problem has not been metioned in the risk section of the IPO papers. tell the gods in the plex that if this problem is uncovered after the IPO has been launched, that this is very serious grounds for stock fraud and manipulation!

steveb

9:47 pm on May 20, 2004 (gmt 0)

"Don't try and tell me that sites that have been in the index, fully displayed for up to 4 years fall into this category - come on!"

Why don't you want to be told the obvious answer? Google hasn't recently crawled pages that it either crawled awhile ago or merely saw the link but didn't crawl through.

Duplicate content, relative link, poorly constructed websites are the ones mostly being hit by this. Huge sites with pages that only have one or a few super deep links to those pages also get hit for exactly the reason Google says, they haven't crawled the pages since they last dumped the master cache for it.

Big sites do have a hard time keeping a crawlable structure consistently thoughout their sites, but that gets more important as Google is apparently depending more on us to tell them what is important via linking.

renee

11:15 pm on May 20, 2004 (gmt 0)

>Duplicate content, relative link, poorly constructed websites are the ones mostly being hit by this. Huge sites with pages that only have one or a few super deep links to those pages also get hit for exactly the reason Google says, they haven't crawled the pages since they last dumped the master cache for it.

i disagree. almost all sites are affected. just look at any large enough site. a few more examples:

as i said almost all pages are affected. and my speculation is that google chooses pages randomly as the fairest way to allocate their precious space in the full index!

steveb

12:14 am on May 21, 2004 (gmt 0)

What are you disagreeing with?

Webmasterworld has thousands of pages that can only be accessed via a daisy chain of linking. Many of the URL pages are years old, PR0 pages like [webmasterworld.com...]
Google is crawling more, but less deeply. Perhaps it isn't unreasonable to think that they should crawl every ancient page with one link to it every month, but its no surprise that they don't.

otech

12:47 am on May 21, 2004 (gmt 0)

i agree with Steveb, i mentioned in an earlier post here that my sites 'printable page' versions of main pages were all url only listings; these are about 5 levels deep and have only one link to them. Plus, they are a duplicate of the web version they are linked from.

I have added NOINDEX to these pages, and now i have only got about 20 url only listings of these pages that havent been crawled. So i have reduced the number of URL only listings on my site my marking these pages noindex; 20 out of 990 is ok to me. - compared to 200/1100 or so it was three weeks ago...

i truly believe that google has just added a few more factors into the mix of its crawlers that determine whether a page is even 'worth' indexing properly.. including dupe content and long paramater urls ie: ?product=2222&department=345344..

space problem that isnt publicly known just before they go public... i think they are a little smarter than to try to rip off their investors; its not like they are selling a dodgy used car here...

Marcia

1:43 am on May 21, 2004 (gmt 0)

>>disagree

I don't know why the disagreement either. There is not just one reason is there?

Pages can be URL only when they're first discovered, and they can ALSO turn URL only when they're being removed for penalties or otherwise.

If you track a site getting pages removed you can watch the number of URL only pages gradually increase over a number of days, especially if you keep watch on a couple of different data centers. And no, they are not being removed for lack of room.

renee

1:46 am on May 21, 2004 (gmt 0)

>Duplicate content, relative link, poorly constructed websites are the ones mostly being hit by this.

steveb,

i am disagreeing with the above statement. all sites are being hit not just sites with duplicate content, relative link, poorly constructed. do you have any facts to support this assertion? or are you just speculating?

<snip>

[edited by: Marcia at 3:52 am (utc) on May 21, 2004]
[edit reason] No pointing out specific sites, please. [/edit]

renee

1:51 am on May 21, 2004 (gmt 0)

>Pages can be URL only when they're first discovered, and they can ALSO turn URL only when they're being removed for penalties or otherwise.

marcia,

look at the evidence:

site/fully indexed pages/url-only pages
webmasterworld.com/116,000/29,000
google.com/196,000/29,300
mtv.com/162,000/86,500
cisco.com/187,000/107,000
guardian.co.uk/309,000/235,000
whitehouse.gov/36,800/38100
msn.com (1,580,000 / 1,830,000)
yahoo.com (5,460,000 / 3,290,000)
cnn.com (501,000 / 207,000)
amazon.com (2,590,000 / 2,350,000)

you think these are newly discovered pages? are they being removed for penalties.

if we propose any theories, let's try to support them with facts. otherwise we're just spinning old wives tales.

steveb

2:50 am on May 21, 2004 (gmt 0)

"all sites are being hit"
"just spinning old wives tales"

This phenomenon occurs in lots of large sites but certainly not all sites on the Internet. That's just silly.

Forget this running out of space junk. Google is crawling more actively than ever before, with fresh tages appearing every single day.

This 312 message thread spans 11 pages: 312