homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 254 message thread spans 9 pages: 254 ( [1] 2 3 4 5 6 7 8 9 > >     
Pages Dropping Out of Big Daddy Index

 6:11 am on Apr 25, 2006 (gmt 0)

Continued from: [webmasterworld.com...]

One thing to bear in mind is that Bigdaddy will have different crawl priorities. That can account for some of it. If you've run into any spam problems in the past, you might also want to do a reinclusion request. Otherwise, please send an email to bostonpubcon2006 at gmail.com with the subject line "crawlpages" (all one word), and I'll ask someone to see if they notice any commonalities.



 6:18 am on Apr 25, 2006 (gmt 0)

thanks GG that email address is going to get busy


 9:06 am on Apr 25, 2006 (gmt 0)

Unless this is a glitch within a major glitch :) we are still dropping pages all over the place. Looks like all of the supplimentals have been dumped.
Our big site is down to just 2 pages from over 11,000 two weeks ago... Our next largest dived from 10K to just 44 pages today. Next in line went from 105 pages to 44.
We still have 2 sites holding - one at over 10k which is odd because it only has maybe 3 - 4k in pages anyway. The other at 26,900 is right off the charts as it only has at best 5k in total page count. Traffic from these two sites is steady... So the fun continues. Interesting to note that the pages staying in have the highest % of unique text on average. This might be our answer.


 10:04 am on Apr 25, 2006 (gmt 0)

thats the same problem as I've been finding kidder. Mine dropped from 600 to 123 so when I hear back I'll let you know what was said


 11:02 am on Apr 25, 2006 (gmt 0)

I have been considering this issue all weekend. This morning I done a quick site:www.mysite.com check and seen it had risen by 10 or so pages.

After catching up on this thread I've come to the conclusion that my pages are just too similar to Google though to the visitor they won't. (not for any underhand tactics, just how the pages are arranged and the type of site it is)

The pages that are doing well have slightly different or additional on page text, as slight as it is, this seems to be a factor. Today I will experiment with 3 sections of the site and take it from there. I will happily report back any positives I find.


 12:01 pm on Apr 25, 2006 (gmt 0)

This whole "maybe my pages are too similar" concept is, in my opinion, just a red herring. How on earth do you accurately measure "too similar"? Too similar to what? How many words do you have to change before something is disimilar enough? Can you imagine how similar a 50000 page real estate website is? 50000 pages, all describing a different house for sale: X bedrooms, Y bathrooms, Garden, Garage...and so on. Even google aren't big-headed enough to think that they can accurately judge this level of similarity algorithmically.

There are some far more obvious candidates for the cause of the Google bug (or bugs) that are entirely in line with the Big Daddy changes that Google have confessed to.

We know for example that their crawlers just don't crawl like they used to. The code has been completely changed (loads of opportunities for fundamental bugs), the crawl-priority metrics have been changed (loads of opportunities for fundamental bugs), and they have introduced a new cache to try and reduce bandwidth usage (again, ample opportunity for fundamental bugs).

The only hope for progress is if Google can somehow be made aware of the problem. At the moment it is all too easy for them to just dismiss all of our grumblings as the death cries of a bunch of evil spammers.

The other hope would be for the Press to finally wake up start highlighting some of the issues. They can't ALL be shareholders, surely?


 12:08 pm on Apr 25, 2006 (gmt 0)

"The pages that are doing well have slightly different or additional on page text"

Exactly the same here. I'm now going through and adding anything I can to try to push my missing pages over the google threshold - whatever that might be.

Sad thing is, I'm now not coding for the users - just for google, but what can you do? - Can't hang around for 6 months waiting for google to either fix its bugs or change its 'too similar' threshold, that's even if the problem is either of those, and not that we have simply been deliberately excluded.


 12:11 pm on Apr 25, 2006 (gmt 0)

Thanks GG! Sent you two examples to the specified email address.


 1:16 pm on Apr 25, 2006 (gmt 0)

Perhaps the bugs are out of our control like ClintFC suggests, perhaps they aren't.

I ran several tools over my pages and the outcome was a high % of similarity.

So i have just under 10% of my site showing up in the index and over 95% of these are marginally different to each other, very small.

If I change 100 or so pages in a simillar style to those indexed then over the coming weeks I will know if that is the problem or not.

What is there to lose? After just under 6 months of zero traffic from Google to a hundred positions returned at the expense of 1000 pages of my site I personally have nothing to lose. The additional and altered text will fortunately be informative so it won't look out of place.


 1:21 pm on Apr 25, 2006 (gmt 0)

I am pretty sure it is more then a similarity problem as I have a site with many unique pages and it's been reduced to rubble. Are there similar pages? Sure, but that does not account for the total reduction in pages to almost nothing.


 1:28 pm on Apr 25, 2006 (gmt 0)

I cannot confirm the duplicate content theory. With my pages it seems to be pure coincidence.


 1:31 pm on Apr 25, 2006 (gmt 0)

hopefully once we get some replies to these emails we may have a better insight into whats going


 1:52 pm on Apr 25, 2006 (gmt 0)

I did see on one of my sites where it has been reduced to 60 pages from many thousands that in the last 72 hours 1000+ mozilla googlebot hits. Praying this is good news (and isn't 1 of the 2 i reported to the GG email addy).


 2:56 pm on Apr 25, 2006 (gmt 0)

Lost mine, now gaining again (back up to round 6000 pages).

Big drop in traffic! Scary!


 4:14 pm on Apr 25, 2006 (gmt 0)

I think many webmasters are missing the latest google trend. If a site has high number of similar pages (product pages) it must be an e-com site. Google is inclined to demote e-com sites in organic seprs since the preferred place for them is in AdWords. Free ride is finally coming to an end.


 4:57 pm on Apr 25, 2006 (gmt 0)

Look thru the earlier thread posts. It's not just EC.


 5:00 pm on Apr 25, 2006 (gmt 0)

I agree with gford. This is a bug or combination of bugs, and not a conspiracy to force people to AdWords. If that was the game plan, why even introduce the free Google Base?


 5:05 pm on Apr 25, 2006 (gmt 0)

Nobody said it is a conspiracy. It is trend in google's evolution


 5:39 pm on Apr 25, 2006 (gmt 0)

Over the past couple of days we have seen our "similar" pages starting to be indexed again, but only at the rate of 10-20 a day...long way to go still!

GG...thanks...have sent you my email.


 6:25 pm on Apr 25, 2006 (gmt 0)


Do try and keep up. This isn't a ranking issue. The pages are missing altogether. Are you seriously suggesting that Google are deliberately "trending" towards not listing any commercial sites at all? Are they a search engine or not?


 4:39 pm on Apr 26, 2006 (gmt 0)

Thanks GG.

> different crawl priorities

So, as Matt mentioned in Boston, the total number of spiderlings we will see per day on high value pages, should drop by about 2/3ths?


 5:25 pm on Apr 26, 2006 (gmt 0)

My ecom site, and the one most likely to have dup content, is unaffected, page count as per pre bigdaddy.

One of my non-commercial sites, all unique pages, has lost about 30% from index.
My blog has lost all its indexed pages bar the categories, archives and feed etc. Al posts have disappeared.
I have numerous sites and I can see no pattern at all. It cannot be a planned result, either a fault or they are gradually rebuilding the index


 6:37 pm on Apr 26, 2006 (gmt 0)

I am seeing reduced crawling (practically nil on one site) and there are definitely NO pages in them that are similar.


 6:51 pm on Apr 26, 2006 (gmt 0)

ClintFC: that has been said before, indeed the dynamic duo that run Google have said as much.

However, I have sevreal sites three of which I watch, 1 is info based now supplemental, one is ecom now supplemental and one is info and ecom.

Which is interesting as this site vanished and has now started to make a come back.


 6:57 pm on Apr 26, 2006 (gmt 0)

I've been watching the results counts closely for one of my sites over the last few weeks and depending on when I happen to look I may see anything from 67,000 results to 3,000 and once just 500 something results.

Over 90% of these pages are generated on the fly from a database which feeds into one of two different templates so they would probably fail any loosely bounded content similarity test. The other 10% have the benefit(?) of several years worth of effort at providing a 'consistent user experience through out the site'

Throughout it all the inbound has pretty much stayed the same, rising steadily from Monday through Friday and then dropping off to slightly lower levels on the weekends.

So, with this one site at least, I can't see where anything has really changed other than the different reported result counts from time to time.

It makes me believe that everything is still indexed properly but my query may be hitting different servers showing cached results in different stages of being updated with new results.

It would be interesting to see how results differ between servers but I am not sure how to select any different ones by IP address. Has anyone already tried this?


 9:21 pm on Apr 26, 2006 (gmt 0)

Mine dropped from 600 to 123

Phah! 500,000 down to 44,300 right here! All turned supplemental after two years of good rankings. No I ain't spam, no I ain't scraper, no I ain't MFA and no I ain't an espotting affiliate .... even those are still ranking better than me! ....

... unless Google has raised their unique content filter to "must have at least 90% unique content" I don't have an explanation.


 10:31 pm on Apr 26, 2006 (gmt 0)

One site I monitor had most of it's main pages dropped from the index (totally unique content) and other pages on the site (inventory and info pages re widget parts) are still listed. If it was a duplication problem it would have affected this site in the opposite way.

I suspect it is a bug in a new alogrithm that will iron itself out soon.

I also sent an email


 10:35 pm on Apr 26, 2006 (gmt 0)

One of my very consistent sites has been effected, dropping from 400+ pages to 3 (plus a couple of supplementals which have been 301'd for at least a year).

Some points of interest perhaps?

- Pages have no meta desc/keywords tags, just unique title tags.
- Pages use full title tag as URL
- Recently secured my previously Open dns servers. (long shot, but thought I'd mention it)

[edited by: lawman at 5:58 am (utc) on May 6, 2006]


 11:46 pm on Apr 26, 2006 (gmt 0)

I am seeing thru my own sites and people I know:
o blogs
o ecommerce
o articles (unique and distributed)
o forums
o financials
o and more..

everything being hit and hit really hard in the "G"ut.


 12:06 am on Apr 27, 2006 (gmt 0)

I can't complain as one of my competitors was made 99.99% supplemental.

Thanks GG ;)

This 254 message thread spans 9 pages: 254 ( [1] 2 3 4 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved