Welcome to WebmasterWorld Guest from 188.8.131.52
I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000.
So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.
That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.
Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.
I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.
Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."
I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.
[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]
thanx for sharing your information with us.
You said that some of the sides have spam penalty.
For us webmasters could you please do some outlining what is spam? I donīt mean in detail.
Maybe you can just answer this questions be adding a yes/no comment?
1. pages with mostly similar content about 80% is spam
2. stand alone product pages are spam
3. linking from deeper pages to top pages is spam
4. is there spam spam factor based on pagerank ( means higher pagerank, lower danger of beeing trapping into a spam penalty )?
maybe someone adding more questions?
thanx gg in advance
I believe if we webmasters know more about how spam is defined we rather can do some work and helping making your index more relevant again.
So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.
That is not what is happening - the pages are gone on query matches aswell as site:domain.com matches.
However, I can believe for some of us that this maybe related to a cleanup of supplementals.
GG - what do you mean by a penalty - if you do a site:domain.com search and your homepage is not top is that likely to be caused by a penalty? - how are we supposed to know if we are penalized if we are in the serps and whenever we contact Google they return with a reply you are not penalized you can be found on a search like site:domain.com?
Is the penalty that Google applied to sites that had/have Canonical problems grouped into the above? Are Google still looking into a way of fixing that problem? (The penalty that went with the issue as much as improved Canonilzation)
Tying together all of the evidence from my own experience, and that of others gleaned from the forums, erroneous or out-of-date backlinks would explain all of the missing pages.
The erroneous, or simply out-of-date, backlink information (which we cannot see) leads to insufficient PR (which we cannot see) and hence deep pages are not indexed.
We all know that a "link:www.mysite.com" does not show you the complete picture. But, since Big Daddy, it now shows just a tiny proportion of backlinks. Way less, than it used to show before Big Daddy. Why? Because either the backlink index hasn't been updated (and now dates back to mid 2005), or else because it has been updated, but the update process is buggy. Only a small handful of Google employees know which of these two possibilities is the case.
We know that the missing pages problem cannot be due to any kind of duplicate content filter, as some people are suggesting. If this were the case, then effected sites would see a proportion of their pages disapear. Some would lose 10%, some would lose 40%, and some would lose 95%. But that's not what we see. We see sites losing the vast majority of their pages or else losing no pages at all. The reason effected sites lose such high percentages of their pages is because of the hierarchical nature of a site. The number of pages increases with depth, and the artificially low PRs (based on innacurate and/or out-of-date backlink data) prevents the deeper content from being indexed.
The fact that Big Daddy was kick-started from an index dating back to the middle of last year, not only explains why the backlink data might be stale, but it also explains why ancient pages keep popping up on various data centres.
As further evidence: try a "link:www.mysite.com" and compare it to a search for "www.mysite.com". In my case, the "link:" search shows just 6 results, only one of which is external to my site. The one external backlink probably pre-dates when Big Daddy's index was seeded. The "www.mysite.com" search, on the other hand, finds hundreds of results representing hundreds of internal and external backlinks. Why aren't these showing up in the "link:" search? Is it because "link:" searches are well known for not showing you the complete picture? Or, has that well-known fact simply been obscuring the true cause of all of the problems? Namely, that the backlinks are simply missing from Google's backlink index.
Sorry for waffling on...I think I've finally run out of steam now.
For at least one of my sites the number of supplemental results has increased dramatically and traffic is about 50% what it was before all the pages went supplemental. This happened in the middle of April.
- Traffic post May 2 is only 30% of previous
- Pages beginning to drop on May 1-2, to now between 25% to 40% of pages left in index depending on the DC
- Supplemental Results began to show around May 5th, previously no Supplementals showed
- has had a 301 www/non-www redirect for years, doesn't appear to be an issue with the site: command
- allintext: searches have our site missing, other searches fine
- searches for unique text from our homepage has many scraper sites coming before ours for many searches, some we also top the scrapers! (sarcastic-cheer!)
Anyone else have any similar problems? I'm thinking this website's problem is due to scrapers and how G's reworking their BD index. Part of me would like to stay hands off and let it work itself out (over the coming weeks? months?), however, I'm wondering if re-writing a lot of the site's content would perhaps lift the problem that we seem to have due to the scrapers... it's a small site so a rewrite isn't out of the question.
i mean that if you have an hirachical architecture with 3 or more stages like a internet shop. Normaly your product pages are in last order of the architecture. Imagine you have a shop with 15.000 products and each of the side has random links to 2nd stage pages or first page . SImply just to show the user some related products or products to use with. E.G. Digicam and SmartCard! (Remember: we should build pages for users not for SEs and that in point is a good thing for users). Is it spam? cause you do the way back up of linking architecture.
The target sides gain a lot of PR from that!
Please donīt tell me that doesnīt work. I allways was wondering why one page "Basket" has higher Pagerank than all other pages. All the product pages pointed to that page. And that were the only ones.
1. Because I advertise my site with Adwords, a large number of MFA sites have managed to get pages indexed with Google that link to my site (through the cached Adwords ad). This makes it look like I've been hanging out in bad link neighborhoods.
Am I being penalized by Google for being a good Adwords customer?
2. A slight variation of (1): A good inbound link to my site appears on a well established authority site. This page has been scraped numerous times by MFAs, which repeat the link as well as the surrounding copy.
Am I being penalized because I have multiple inbound links from MFA pages with duplicate copy?
Kind of makes you wonder whether it's really worth writing original, useful pages for actual readers, since it's impossible to attract any of those readers.
My sites are basically being deindexed. Everyday about 5-10k pages are being dropped for the past 2 weeks. I am now about 10% from where I was in the beginning of April in terms of pages indexed. A friend of mine runs a small niche site that ranked number 1 for its keyword for 3 years. Its now number 7 on some DC's, without the title of the site showing, and on other DC's its 21 with the title showing. I also see my sites being displayed without a title or descrption as well but only on some DC's. Strange thing is the cahce date is May 7th.
Am I being penalized because I have multiple inbound links from MFA pages with duplicate copy?
That's been a theory for some time, links outside a webmasters control causing bad neighborhood associations. Yes, the association is believed to be passed from the scraper sites and there's nothing you can do, ever try to contact a scraper site?
One belief is that associations of incoming links can make or break a sites rankings, again the conspiracy theory of no-one being able to hurt another sites rankings comes into play here...
It's a common practice for scrapers to link to the good quality sites, so there can be many arguments for or against this theory. It would be pretty stupid to penalize sites for MFA page links that webmasters obviously have no control of, but the algorithm would suggest that these links would come into play since its all automated.
It has to be about backlinks. Duplicate filter thing is bogus due to the massive page drops on relevant sites. I am so anti dup content and I am still losing pages on new sites.
EVERYONE: Are the sites/domains that are dropping pages fairly new?
Have you used link directories as your main source of link development?
Do you know if your friend has changed his registar info recently? If so the domain can then again be seen as newish.
Plus even if you dont get directories links, doesn't mean there isn't a web of directories with your friends sites and therefore the links from there sites are not as powerful anymore.
[edited by: Relevancy at 5:31 pm (utc) on May 9, 2006]
As of today G got rid of our supplemental results.
However, they are not indexing most of our pages.
Whichever way I do site: (w/ or w/o www) it would come up 24 pages. (as of today)
Should I expect now, the fact that G got rid of the supplementals they will start indexing more pages of our site?
nothing has changed
>The dropped pages were all added within about a three-month period starting in December
thats about the only thing I'm picking up from this that the dropped pages are pages that have been added over the last 4 months and that applies to both our sites. It's almost like anything newish has been dropped
I hear some sites are only now starting to loose pages, mine have been gone for almost 2 months! and have been up and down over the last month about 5% each way, currently on a high 432 last week it was 375 should be 990 ish.
Seems like a stage thing to me, our moans and groans have been the same, just at different times over the last 2 months or so.
Another site, 8 months old, pages gone from 60,000 down to 252. Pages are coming back at a rate of 10 a week...at this rate I might have to get a job!
As I don't want to get a job I have been doing some testing. I find that I can get small new sites, to rank well within a week and stay there. OK I would have to make a lot of new sites to get over this Google indexing problem but I've got to do something. How can new sites have pages indexed before established sites? It just doesn't make any sense.
Come on Google, sort it out.
The "same snippet for every page" appears to be happening in most (if not all) datacentres.