homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 53 message thread spans 2 pages: < < 53 ( 1 [2]     
Inflated Page Counts in Google
Hitting the 1,000 page mark caused quite a jump...
BillyS




msg:751768
 4:11 pm on Sep 28, 2005 (gmt 0)

My website has been nearing a milestone for the past couple of weeks as the site approached the 1,000 page mark in Google. I had been watching this closely (with some pride) as the number in Google's index climbed.

Over the past couple of days, the number stood at 999 using the site:www.foo.com command. Today I noticed something interesting. The same command that returned 999 pages over the last several days now shows 9,140 pages. Quite a jump. Now I understand why everyone's been talking about inflated page numbers. I was wondering if this happened only for sites in excess of 1,000 pages in Google's index. For someone that just crossed this milestone, that was my experience.

 

g1smd




msg:751798
 10:34 pm on Oct 7, 2005 (gmt 0)

>> I have had a rewrite for the non www pages for over 6 months, but google recently brought back in cached versions of non-www pages from last year. <<

I see this a lot. There are very many irrelevant Supplemental pages with cache dates from one year ago all over the SERPs. Been like it for several months.

>> By the way, at least HotBot can count. Checking in Hotbot indicates that Google has exactly 1,000 pages for my website - not 9,100. <<

No idea about HotBot, but the page counts in DogPile do NOT include any supplemental results - they only include normal pages.

nickied




msg:751799
 11:08 pm on Oct 7, 2005 (gmt 0)

allinurl: 10,500 pages. actual pages just < 1,000.

Searching for old page copies I have supplementals on 5 different cache dates: January, February, June, August, September. Also current month October. On the current stuff: Sept/Oct about 400 of the 1000 are url only fwiw.

theBear




msg:751800
 4:41 am on Oct 8, 2005 (gmt 0)

If your server is sending out headers including this

Last-Modified: Wed, 05 Oct 2005 21:59:51 GMT

which is inflating counts by a large factor.

Add to this the supplemental pages, non canonical pages and you get what Google has.

Just do a check on apache.org using Google, MSN, and Y!

paradoxos




msg:751801
 7:42 pm on Oct 8, 2005 (gmt 0)

I don't know what this means but our site: command brings back over 90,000s pages but when we use Google's API to query, it lists 6,500 pages indexed.

nickied




msg:751802
 7:58 pm on Oct 8, 2005 (gmt 0)

Bear

>> If your server is sending out headers including this Last-Modified: Wed, 05 Oct 2005 21:59:51 GMT which is inflating counts by a large factor. <<

Bear -

Can you elaborate on this please?

theBear




msg:751803
 8:22 pm on Oct 8, 2005 (gmt 0)

For some reason Google is counting each page that a server delivers with Last Modified in it's header as a "page".

If you do a site command in Google the count returned seems to be "inflated" whenever the server the site is hosted on sends the last modified date as part of the header.

Paste your home page into a header checker and take a looksee.

Take your site and do site commands in each of Y!, Google, and MSN.

A very good example is the Apache website.

This was mentioned by a couple of others in stickies.

I just thought you might want to know what appears to cause wildly "inflated" page counts.

There are also supplemental pages and in the case of Windows servers mixed case duplicate page names, etc...

In short it is a number of factors.

Now what Google does with the "pages" with last modified dates on them other than add to a counter I haven't a clue.

2by4




msg:751804
 9:15 pm on Oct 8, 2005 (gmt 0)

theBear doesn't have this quite right, close, but not quite. It's not the last modified header per se, it's scripted last modified headers, at least from what I can tell. In other words, a static html page will send out last modified headers, but will not trigger this event, at least it didn't on any site I checked it on.

A dynamic site, however, that does send last modified headers, does appear to trigger this bug. I have enough sites done both ways to determine this without much doubt, then the apache.org site thebear mentioned is an excellent example of this.

Also, there appear to be other factors that contribute to the inflated page count, steveb notes that site with > 1000 pages also seem to trigger this error, although it would be nice to know how many of those are sending last modified headers as well.

So we have a new indexing system in google, and that new system has bugs in it. From now on I'm assuming I'm looking at a new algo, that was implemented around last december. Which means I'm going to be spending my time learning how this new beast works. The bugs I'm seeing all point to new system, these are not mature bugs, it's something new, this stuff is too basic.

This will make it a bit harder to know what's causing events, for example, yesterday, I just saw another new bug in action, we made a tweak, the bug was revealed, something that shouldn't happen happened, and the site rose. All very interesting.

2by4




msg:751805
 10:53 pm on Oct 8, 2005 (gmt 0)

From feedback I'm getting, it appears that dynamically modified headers in general may be causing issues, it's too hard to say for certain which cases will trigger this event, as I noted, last modified definitely seems to do it. We're definitely seeing a significant bug live, this is quite unusual for google, but it's easy to see how this slipped under the early testing radar.

theBear




msg:751806
 11:12 pm on Oct 8, 2005 (gmt 0)

2x4 I think it depends on how the server itself is also setup.

I haven't gone playing with anything to say for certain under what conditions this appears however one static site that serves that data in the headers also is showing wildly inflated counts.

In any event the counts are shall we say not 'xactly spot on?

2by4




msg:751807
 11:22 pm on Oct 8, 2005 (gmt 0)

Interesting, definitely interesting. How many pages does the static site have? It's definitely static right, no processing at all? No addtype application php etc? IIS or Apache?

Curiouser and curiouser as the book said... I wonder just how wide ranging this is.

BillyS




msg:751808
 12:46 am on Oct 9, 2005 (gmt 0)

I hit the 1,000 page mark, that seemed to trigger the bug, however, my site is dynamic (.php) and it does send out a 200, not a 304 (Not Modified).

Rick_M




msg:751809
 4:04 am on Oct 9, 2005 (gmt 0)

Not sure about anyone else, but my page counts have been steadily decreasing over the past week, with a huge drop in the past 24 hrs, from 11k pages on one site to around 900 pages. 2 weeks ago, the site was showing 26k pages. I'm guessing that Google is cleaning things up. So far, though, this hasn't changed (helped) my rankings any - as I dropped about 30-40 spots (from top 10 for many searches) on September 22nd. I just keep my fingers crossed that with the next "major everflux" (somewhere between everflux and an update?) my rankings will return to somewhere near where they were.

g1smd




msg:751810
 7:10 am on Oct 9, 2005 (gmt 0)


Beware of a 301 redirect from non-www to www where the defaultsitename is domain.com and where you are linking to a folder, and where you forget to add the trailing / to the URL in the link.

If you forget the trailing / then your link to www.domain.com/folder will first be redirected to domain.com/folder/ {without www!} before arriving at the required www.domain.com/folder/ page.

The intermediate step, at domain.com/folder/ will kill your listings. Lucklily, this effect is very easy to see if you use Xenu LinkSleuth to check your site: it shows up as reporting double the number of pages (when you generate the sitemap) that you actually have, with half of the pages having a title of "301 Moved".

longen




msg:751811
 7:27 am on Oct 9, 2005 (gmt 0)

Up until recently Google had 15K pages indexed for my site, recently jumping to 54K. The real pages number 5000+
I mentioned in another thread some time ago that i went through all the Folders for my url in the google index to check which folders had incorrect page totals.

To check this for yourself type this into the Google search box:
site:www.yourdomain.tld/folder/

All my folders had the correct number of pages indexed apart from one - which was the only folder with 1000+ real pages. So there may be some kind of 1K problem.

2by4




msg:751812
 7:50 am on Oct 9, 2005 (gmt 0)

longen, good point, that clarifies it even more. All signs point to > 1000, plus some other glitches that appear to bring it up to that level, looks like steveb had this one right. I tried it, that's right, it's not the headers, oh well, sounded good when I typed it, LOL...

taps




msg:751813
 8:59 am on Oct 9, 2005 (gmt 0)

I see the number of my serps steadily going down. But very slowly. On some DCs I reached 290k entries. Started at 320k. The goal is < 100k.

Still a long way but there's hope.

Concerning the 1000 pages barrier: Maybe Google considers smaller sites as no problem when they have dupe content and applies the dupe filter only at sites with > 1000 articles.

phantombookman




msg:751814
 12:37 pm on Oct 9, 2005 (gmt 0)

I suspect the 1k barrier is simply a blind. Google wanted to inflate their index and if people checked they are unlikely to click through 1,000+ pages so the 1k barrier/trigger is as good as any number.

Where it has happened to me there are no extra or duplicate pages. Do the site: search and click through the results (if viable to do so) in every case I find the results actually stop at an accurate page count!
It is simply a case of Google claiming the index is bigger than it really is - no technical mystery.

I mentioned, a while ago, in another thread that this was such an obvious ruse to see through that it may well backfire in bad publicity. Now we see the index size claim has gone from G's homepage. Perhaps this is a vindication?

freaky




msg:751815
 3:09 pm on Oct 9, 2005 (gmt 0)

I Agree to HERBS comment of

[Have a site with about 12,000 pages. Google for 4-5 months reported 24,000 and was sending visitors. Beginning of this month it thinks we have 72,000 and stopped sending visitors.]

My Statement

I have a site with above 1 lakhs pages yet google till last month had spidered around 22500 pages and the visitors from google was around 2500 to 3000 per day NOW a GREAT NEWS recently google spidered our site again in depth and we have more then 60000+ pages spidered in google but the BAD NEWS from last 2 weeks we have been only abel to receive not more then 400 google visitors per day now what you all have to say about it? Dose Google Spider More Pages and stop sending visitors. funny huuuu.... GGGGRRRRRR

theBear




msg:751816
 3:53 pm on Oct 9, 2005 (gmt 0)

Another site using last modified that had cleaned up its problems now it again has a "_FEW_" more pages counted than it should have.

Results 1 - 10 of about 15,300,000 from dmoz.org

Anyone from the ODP who cares to comment?

Maybe Google is cleaning up its collection of oddball pages and that is what we are seeing.

g1smd




msg:751817
 5:12 pm on Oct 9, 2005 (gmt 0)

This is not an official statement of any sort: merely an observation.

Half a million category pages, each with a link to the "suggest a site" page, the "update listing" page, the "category description" page, the "category edit" page, the "apply to be an editor" page, the "edit description" page, the "report abuse" page, etc, makes about 4 to 5 million "real" pages. Then there are less than half a million other pages (mostly informational) on this site, and other dmoz.org sub-domains. The true total should be well under 6 million.

Notice that site:www.dmoz.org returns zero results. A year ago that result showed several million of the "302 hijack" URLs. Google has since filtered them out of the results.

For site:dmoz.org I guess there are still millions more 302 hijack pages - Google are not showing them, but they are still counting them.

BillyS




msg:751818
 2:21 pm on Oct 27, 2005 (gmt 0)

I'm throwing this one out there again because I continue to be punished (for unknown reasons) in this Google update.

Even for a term that no one optimizes for (my name), I continue to sink in Jagger2. At the same time, my site (which has around 1,025 pages now) continues to rise in page count using the site:www.sitename.com query. Right now the count stands at 9,700 pages.

Is anyone else suffering from a recent run up in page numbers (as reported by Google) also sinking in Jagger2?

As mentioned, I'm afraid Google is using this incorrect count in their calculations and that makes it look like the site has grown from 1,000 pages to nearly 10,000 overnight.

steveb




msg:751819
 10:09 pm on Oct 27, 2005 (gmt 0)

A few people have mentioned getting lost when breaking the 1000 page threshold in the past six weeks.

daveVk




msg:751820
 7:04 am on Nov 3, 2005 (gmt 0)

Also seeing about 10x jumps on 1000 pages, can get true page count by adding carefully choosen keywork eg pages with and without 'widget' provided both return under 1000 pages. The 1000 limit is per query rather than per site

This 53 message thread spans 2 pages: < < 53 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved