homepage Welcome to WebmasterWorld Guest from 54.167.173.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 249 message thread spans 9 pages: < < 249 ( 1 2 3 [4] 5 6 7 8 9 > >     
Pages Dropping Out of Big Daddy Index
Part 2
GoogleGuy




msg:716524
 7:59 pm on May 8, 2006 (gmt 0)

Continued from: [webmasterworld.com...]


internetheaven, you said:

I had 20,300 pages showing for a site:www.example.com search yesterday and for the past month. Today it dropped to 509 but my traffic is still pretty constant. I normally get around 4,500 - 5,000 to that site per day and today I've already got 4,000.

So, either Google doesn't account for even a small percentage of my traffic (which I doubt) or the way Google stores information about my site has changed. i.e. the 20,300 pages are still there, Google will only tell me about 509 of them. As far as I can tell, I think the other pages have been supplemented.

That resonated with something that I was talking about with the crawl/index team. internetheaven, was that post about the site in your profile, or a different site? Your post aligns exactly with one thing I've seen in a couple ways. It would align even more if you were talking about a different site than the one in your profile. :) If you were talking about a different site, would mind sending the site name to bostonpubcon2006 [at] gmail.com with the subject line of "crawlpages" and the name of your site, plus the handle "internetheaven"? I'd like to check the theory.

Just to give folks an update, we've been going through the feedback and noticed one thing. We've been refreshing some (but not all) of the supplemental results. One part of the supplemental indexing system didn't return any results for [site:domain.com] (that is, a site: search with no additional terms). So that would match with fewer results being reported for site: queries but traffic not changing much. The pages are available for queries matching the supplemental results, but just adding a term or stopword to site: wouldn't automatically access those supplemental results.

I'm checking with the crawl/index folks if this might factor into what people are seeing, and I should hear back later today or tomorrow. In the mean time, interested folks might want to check if their search traffic has gone up/down by a major amount, and see if there are fewer/more supplemental results for a site: search for their domain. Since folks outside Google couldn't force the supplemental results to return site: results, it needed a crawl/index person to notice that fact based on the feedback that we've gotten.

Anyone that wants to send more info along those lines to bostonpubcon2006 [at] gmail.com with the subject line "crawlpages" is welcome to. So you might send something like "I originally wrote about domain.com. I looked at my logs and haven't seen a major decrease in traffic; my traffic is about the same. I used to have about X% supplemental results, and now I hardly see any supplemental results with a site:domain.com query."

I've still got someone reading the bostonpubcon email alias, and I've worked with the Sitemaps team to exclude that as a factor. The crawl/index folks are reading portions of the feedback too; if there's more that I notice, I'll stop by to let you know.

[edited by: Brett_Tabke at 8:07 pm (utc) on May 8, 2006]

 

montefin




msg:716614
 1:32 pm on May 10, 2006 (gmt 0)

Well, I've reached the point of nostalgia now.

Two or three times a day, I go to Yahoo, MSN, and Ask.com so I can see my previously highest earning, best Google referral URL in the top 10. Because it isn't even in Google anymore since the robots.txt "issue" of April 11-12.

But maybe that's not relevant.

jam2005




msg:716615
 1:36 pm on May 10, 2006 (gmt 0)

>>>Meanwhile Yahoo, MSN and Ask all list all of my pages. As has Google in the past. If giving a Google a sitemap is going to help, then they need to explain why.

My site has a Google sitemap. It is downloaded, but the pages are not indexed. I don't think this is a problem that sitemaps can solve.

moftary




msg:716616
 1:46 pm on May 10, 2006 (gmt 0)

May I add to this thread that I have a site with a few thousands pages that I got them all deleted a year ago or so, and around 8000 pages are still indexed in Google?

Ok, go delete your sites all so their pages can get back into the BD index.

tigger




msg:716617
 1:57 pm on May 10, 2006 (gmt 0)

>My site has a Google sitemap. It is downloaded, but the pages are not indexed. I don't think this is a problem that sitemaps can solve

I agree the idea of G replying to a webmaster telling them to use gmaps is almost insulting considering the problems we know they currently have

Right Reading




msg:716618
 2:15 pm on May 10, 2006 (gmt 0)

Matt Cutts mentioned spam penalties prominently as a factor in dropping pages, and as I recall Google Guy doing recommended reinclusion requests. These statements make me think that G has ratcheted up its spam definitions and a good percentage of the newly excluded pages may be ones that are now getting labelled in the index as spam when previously they were considered clean. Duplicate content and questionable backlinks have been mentioned as possible factors. Does anyone have any ideas of other criteria they could be using that would be new? (My site, BTW, is completely noncommercial -- I don't even use adwords. I'm fairly sure I was hit with a duplicate content penalty. I've tried to address that with little result to this point.)

ClintFC




msg:716619
 2:16 pm on May 10, 2006 (gmt 0)

PR isn't important in regards to the issue, no difference in the problem from our pr7 or pr6 site

I completely disagree. I believe PR is absolutely central to the missing pages issue. You have no idea what PR your sites currently have. You may see PR7 for one and PR6 for the other, but in reality the actual PR for these sites might now be completely different. You have no way of knowing.

It's worth bearing in mind a simple fact (if anything can truly qualify as a "fact" in regards to Google):

Missing/Buggy Backlinks => Lower PR => Shallower Indexing => Loads of Missing Pages

Web_speed




msg:716620
 3:40 pm on May 10, 2006 (gmt 0)

- Authority systems seem to be irrelevant. We have scrapers by the tens of thousands weekly crawling and spamming our content out and now, these pr 1 or 0 sites with little more then 1 ibl are crushing our 2 sites in the serps for almost all positions and our company name. This despite having PR7, thousands of IBL's, clean seo, no changes to the properties in years.

LOL, anyone remembers Google's mantra, "organizing the world information" (yea right). A couple of years later and check out what they did to the web. The web today is the largest pile of MFA's junk the world has ever seen. Even Google themselves are choking on this crap.

"Please use our site maps....please put nofollow tags...please send re-inclusion requests...helps us out a little here we are choking on our own spam. Our algo is so severely busted even our best technicians can no longer make any sense of it". That's what i (and i'm sure that many other veterans) can clearly see.

The wheels are coming off Google search boys and girls, time to move on, and don't forget to let your clients know about it.

Start actively promoting other search engines for your own good and better future.

g1smd




msg:716621
 4:04 pm on May 10, 2006 (gmt 0)

Yes. Their influence has made the web a worse place than it was, and I am not surprised if they have proglems figuring out where the 1% of useful stuff is located.

walkman




msg:716622
 4:30 pm on May 10, 2006 (gmt 0)

for some reason, more and more of my pages are being added to the index. GB visits have increased too. Now all I need is a jump on rankings. I hope the activity is an indicator of an expiring penalty, but probably isn't :)

[edited by: walkman at 4:34 pm (utc) on May 10, 2006]

F_Rose




msg:716623
 4:33 pm on May 10, 2006 (gmt 0)

Have you done anything in specific that may of helped Google indexing?

LuckyGuy




msg:716624
 6:21 pm on May 10, 2006 (gmt 0)

These statements make me think that G has ratcheted up its spam definitions and a good percentage of the newly excluded pages may be ones that are now getting labelled in the index as spam when previously they were considered clean.

But I can show you some sides in top ten positions that have Duplicate Content on different domains, work with doorways, redirects and keyword stuffing. That sides have not been affected to this "filter", instead they gained from the falling out of index from the white head "good" pages.
Yesterday I found a page with 12300 pages in index. ALL of them were filled with keywords and a javascript redirect.
How does it fit into your idea of google having put up the spam defenitions?

Maybe they turned the switch into the wrong direction?
or they did a "not xnor" and can´t find it anymore.

There are to many errors that does not fit into a spam theory: Old pages in index. No new pages in index. To many good pages where hit. Bad pages obviously not...

Relevancy




msg:716625
 6:52 pm on May 10, 2006 (gmt 0)

2 completely unconnected people I kwow recieved thier Google Analytics invite codes today. Capicty is being freed up so more analytics accounts can be opened?

Pages drop so analytics can run so they can better track adwords and site's trends? That = more $$ for Google.

Anyone else get invite codes today?

Play_Bach




msg:716626
 8:34 pm on May 10, 2006 (gmt 0)

When I search my sites using
site:example.com keyword

all I get are supplemental pages. Anybody else?

g1smd




msg:716627
 8:37 pm on May 10, 2006 (gmt 0)

Which datacentre?

There are FOUR different versions of the Google Index out there.
They are all very different.

kamikaze Optimizer




msg:716628
 8:44 pm on May 10, 2006 (gmt 0)

Play_Bach: Not me. I just tried it on all DC's and I am showing health numbers

Liane




msg:716629
 9:03 pm on May 10, 2006 (gmt 0)

These statements make me think that G has ratcheted up its spam definitions and a good percentage of the newly excluded pages may be ones that are now getting labelled in the index as spam when previously they were considered clean.

This thread is remarkably similar to one here at WebmasterWorld after the Florida update in November, 2003. Webmasters were at a complete loss as to why they were being penalized or dumped by Google.

I was amongst those at a loss. My site is very clean and (I thought) always had been up until that point in time. I knew the only thing I could be accused of was perhaps overdoing the keyword density.

Not knowing what to do (but forced to do "something" or starve) ... I rewrote every single page of the site and reduced my keyword density across the board. Worked like a charm! :)

Don't know if this helps, but Google has been tweeking those spam filters slowly but surely for the past two and a half years. Only you know in your gut if your KW density might be perceived as "too much" ... but it can't hurt to try reducing it on a few, select pages to see!

There doesn't seem to be one common problem which you all share. I wouldn't discount keyword density off hand if I were you. Try it and see! You've got nothing to lose and everything to gain.

fred9989




msg:716630
 10:42 pm on May 10, 2006 (gmt 0)

Well, Liane, that's the first time in the current shambles (I think) that anyone's mentioned keyword density, and it may be as plausible as any other theory.
The problem seems to me that everyone has equally compelling suggestions about what to do to put matters right - which indicates that there are many factors in play and that what would put one site right will have no impact on another.
Nonetheless, your comments about keyword density did strike some kind of chord. Has anyone else got proof that this is a factor of importance?
Meanwhile I'll do my own test and report back later.....
Rod

softplus




msg:716631
 10:53 pm on May 10, 2006 (gmt 0)

These statements make me think that G has ratcheted up its spam definitions and a good percentage of the newly excluded pages may be ones that are now getting labelled in the index as spam when previously they were considered clean.

Perhaps:
- they're moving to an automatic spam system, based on their Mozilla crawler (hidden text, hidden links, etc.) (compared to a previously mostly manual-review based system)
- they're moving towards partial spam-penalties (compared to full bans)

Especially if they're moving towards an automated spam-reporting system, they are without question going to run into sites left and right that are prefectly legitimate but where a look in the code could be misleading for a "simple crawler". "White text on white background? oh, didn't spot the black background image" :-) If this is happening, it would look pretty much like what's happening now. Lots of simple spamy sites are reporting problems, lots of normal sites are reporting problems. Since Google by no means can look at all sites in their index before putting something like this live, they can only compare to known-spam and known-nonspam sites, accepting a possible false-postive rate in order to get a higher rate of spam-sites automatically penalized / banned. Great, only 0.5% false positives (just pulling a number out of my hat)? What would that be? A few million sites? :-(

Add to that some possible issues with their proxy/cache system, and you're in for a ride....

fred9989




msg:716632
 10:54 pm on May 10, 2006 (gmt 0)

Just read this on matt Cutts blog. It's priceless. How I laughed.
****
"And now Eric Schmidt is up. Eric points out that Google is putting even more effort into core search quality."
****
Does anyone else here have a severe attack of smugness-phobia?

softplus




msg:716633
 11:10 pm on May 10, 2006 (gmt 0)

If you have seen a change in your site's coverage in our index, please post the details here, including site name and changes in coverage.

Post in the Google Sitemaps group:
[groups.google.com...]

I love how they seem to not have a clue as to why it is happening:
We've been closely following the reports of significant changes in index coverage for some sites, and have been looking into many possible causes.

walkman




msg:716634
 11:27 pm on May 10, 2006 (gmt 0)

>> Have you done anything in specific that may of helped Google indexing?

No, but, my site was, and still is, in the sandbox. I think it maybe coming out though. This is the second time I have been sandboxed (both from linking sitewide from other sites IMO) and I notice a pattern of crawling and having files listed. When I'm sandboxed, I get very few googlebot visits and many pages are either missing or supplementals. Now Googlebot visits are at ~200 or so a day, which is "normal" for me--1400-1500 page site.

There is nothing wrong with my site otherwise, definitely NO spam, dupes and plenty of backlinks.

walkman




msg:716635
 11:34 pm on May 10, 2006 (gmt 0)

>> Great, only 0.5% false positives

I think they could get that low of margin of error, they'd do it in a heart beat :)

arubicus




msg:716636
 12:13 am on May 11, 2006 (gmt 0)

Relevancy

" 2 completely unconnected people I kwow recieved thier Google Analytics invite codes today. Capicty is being freed up so more analytics accounts can be opened?"

What do ya know I got my invite also!

Kangol




msg:716637
 12:27 am on May 11, 2006 (gmt 0)

I've also got one. What does this means?

Whitey




msg:716638
 7:07 am on May 11, 2006 (gmt 0)

Play_Bach :
When I search my sites using site:example.com keyword all I get are supplemental pages. Anybody else?

g1smd : Which datacentre? There are FOUR different versions of the Google Index out there.

Wow - you're right .

try :

[64.233.185.104...] site:website.com and
site:website.com associated mispelling
site:website.com associated correct spellimg

However, some of the other DC's are producing different results, so it could be just part of the variances being experienced overall

Relevancy




msg:716639
 4:15 pm on May 11, 2006 (gmt 0)

The invites mean that they have freed up capcity to allow more analytics users during the "machine crisis". The only way to free up capacity without actually adding more machines (we all established that they have not added new machines yet) is to delete something else. So they killed off tons of pages calling it dup content, bad links, whatever...

All this ads up to more adwords revenue for google. More analytics accounts mean more adwords testing and spending.

kamikaze Optimizer




msg:716640
 6:57 pm on May 11, 2006 (gmt 0)

Hi GG:

I have posted this before in several of the “Reseller DC Watching Threads”.

Unrelated to the issues I see being discussed here, but an issue none the less:

What I have been noticing is far too much weight being given to .co.uk and .ca sites in the USA serps.
So a common search would produce:

Corp-Site.com (valuable site)
Corp-Site.co.uk (Of no value in the USA)
Corp-Site.ca (Of no value in the USA)

I would expect this on google.co.uk and google.ca, but not here.
This tends to push other sites with value way down below the fold.

montefin




msg:716641
 8:33 pm on May 11, 2006 (gmt 0)

Thanks GoogleGuy.

I received a response out of <<bostonpubcon2006 at gmail.com>>:

"Given the positive changes (including no supplemental listings) and the fact that I show no penalties against your site listed in Google, I recommend that you *do* create a sitemap and give it a week or two

"If you continue to see drastically different traffic patterns *and* inexplicable restrictions noted by our sitemaps tool, please write back and I'll escalate your inquiry to one of our sitemaps engineers."

I'm still mulling over submitting a sitemap.

Since my highest earning AdSense URL disappeared from the Google index concurrent with the sitemap/googlebot/robots.txt malfunction _for_that_specific_URL_, I'd really like some specific guidance on whether using Google sitemaps would help or hinder getting my "lost URLs" refound and re-included.

And it's good to know that penalties are not a part of this particular equation.

Thanks again, GG.

Lorel




msg:716642
 12:32 am on May 12, 2006 (gmt 0)

The site I reported on at the beginning of part 1 of this thread is beginning to come back:

Google Index:
Mar 03 178 pages
Apr 26 158
May 04 96
May 06 181
May 07 36
May 09 78
May 11 202

This site has a total of 276 pages. The day the index dropped to 36 pages the traffic dropped also but came back the next day --the graph of traffic looks similar to the graph of indexing.

Some of the dropped pages could be considered duplicates because they are parts for different models and thus the text is very similar for the different models. However there are just as many other pages that are totally unique on every page that also dropped out.

Also all the main pages dropped out of the index but the home page until I sent in the email to Google as requested and they came back the next day but then the fluctuation of the other pages started.

This site (and the affected pages) have been online for several years and the affected pages are not in a database and they are all on root level of the domain although all the recently dropped pages are linked from other pages and thus the theory that any page not directly linked from home page gets dropped fits this situation.

tigger




msg:716643
 8:56 am on May 12, 2006 (gmt 0)

I'm still seeing a recovery but talk about slow! moved from 148 at it's lowest to 215 but thats over a 4 week period

mathia




msg:716644
 9:04 am on May 12, 2006 (gmt 0)

Thangs Google guy
I received a response out of bostonpubcon2006 at gmail.com
I have fill the reinclusion request as you told me and I hope as you told me that my page will have a review from the team.

Again thanks a lot.
rachel

[edited by: tedster at 4:54 pm (utc) on May 13, 2006]

This 249 message thread spans 9 pages: < < 249 ( 1 2 3 [4] 5 6 7 8 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved