Site:www.example.com Returns 5x Greater Number of Pages Than on the Site

Forum Moderators: open

Message Too Old, No Replies

Site:www.example.com Returns 5x Greater Number of Pages Than on the Site

I should have 3,000 pages, Google reports 18,000.

howiejs

9:31 pm on Dec 18, 2004 (gmt 0)

I should have 3,000 or so pages in G. But site:www.example.com reports 18,000+

Why would it be showing so many more pages?

And is this hurting me?

I have noticed in my logs this past week or so that
domain.com (without the www.) is showing some hits for my homepage

[edited by: ciml at 6:09 pm (utc) on Dec. 20, 2004]
[edit reason] Examplified [/edit]

ciml

6:09 pm on Dec 20, 2004 (gmt 0)

Could it be that your pages can be found using multiple URLs? e.g. due to partner sites or session Ids on URLs?

ogletree

6:19 pm on Dec 20, 2004 (gmt 0)

The site command has been showing a lot of weir things latly. When you run the site command do you see any listings that don't have www in front of them.

RoadTrips

8:35 pm on Dec 20, 2004 (gmt 0)

I have been seeing this on my site as well for the past month or so. I am wondering if it has something to do with google displaying old results...I do not redirect pages that I erase off the site...maybe this has something to do with it?

The Contractor

8:48 pm on Dec 20, 2004 (gmt 0)

See message #168 [webmasterworld.com...] is there any chance Google is counting pages, script files, etc. that are blocked via robots.txt?

Rugles

9:12 pm on Dec 20, 2004 (gmt 0)

This has been going on for some time. With our sites I suspect that it is happening because multiple ways you can access our pages.
I can't get too descriptive because I may give up sensitive information.

MLHmptn

9:16 pm on Dec 20, 2004 (gmt 0)

Google's method of increasing the pages they have indexed. Dupe their own results so the stockholders can get erroneous conclusions....

abates

11:22 pm on Dec 20, 2004 (gmt 0)

Old pages sometimes drift back into the index as "Supplimental Results", and then drift back out after a couple of months...

lizardx

12:34 am on Dec 21, 2004 (gmt 0)

Google's index currently appears to also contain links counted as pages. Dumb? Yes. Deceptive? Maybe. Indicating more profound issues with the way they are running their index? I'd say yes.

I see the same thing, google is counting hundreds of pages as indexed that are blocked by robots.txt to avoid dup content stuff, and to avoid exactly what google seems unable currently to do: differentiate between a link to a url and a physical url html file.

beta.search.msn.com shows the same pattern, but has much better treatment of the site pages when you do a site: type command, the junk isn't obvious in the first 100 when I checked, unlike google. Only yahoo currently appears to be listing the actual allowed urls on my site, but it has other problems, like dropping and failing to index new pages.

Blocked links are to things like sections of pages, not the entire page, and so on. Also old junk, old pages that haven't been online for a year or more. They shouldn't have mixed their junk/sandbox index with the main index, haste makes waste, maybe they were too busy hacking in their teens to learn some basic truisms. Or maybe they just wanted to create the illusion of not having an indexing issue. Whatever it is, it didn't fool enough of the people this time around, nice try google.

This leaves us a crippled giant, a limping contender, and a new kid with a lot of issues to work out before he joins the big boys. Fun times in SEO land

RoadTrips

4:19 am on Dec 21, 2004 (gmt 0)

Also, my site shows 30,000 pages most of the time, but then sometimes 10,000 if I do an index command. The accurate number is closer to 10,000...

freegk

4:35 am on Dec 21, 2004 (gmt 0)

Same here, my site having 300000 and google shaws 14,50,000

TinkyWinky

5:43 pm on Dec 21, 2004 (gmt 0)

Yep Google is still 'fraudulently' counting pages that are not supposed to be followed via the robots.txt....

My site recently underwent a complete re-alignment of architecture as well as a major change to a jump tracking page in order to prevent the Googlebot et al following and storing the tracking link in SERPS.

As I have just found out our pages have jumped massively as all the old tracking pages are still there (which is fine they will take a while to fall off at current rates) but there are now an almost identical number of entries for the new pages that are behind a 'forbidden' folder for the robots.txt

These links mean nothing once they have been clicked on as we use a one time link constructed using url, ip, time and date. So if you found the link after 20 minutes, you are directed through to a blank page as one or more factors have changed.

There is no title or description - just the entry within the google database.

Is this a permanent entry against the site or just a temporary 8 billion glitch..... until MSN launches?

Who knows but it is a bit puzzling. But hey haven't shareholders got value - a whole load more duff links. IMHO.

howiejs

2:13 am on Dec 24, 2004 (gmt 0)

Its back to normal. Almost the exact number of pages in google site:

NO change in poor rankings since the update last week