Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google showing triple the number of pages on our site

Problem for a while - can we do anything?

         

Undead Hunter

6:46 pm on May 23, 2005 (gmt 0)

10+ Year Member



Hi Folks:

I don't know where to begin with this problem. Or even if it IS a problem, although our site has been badly burned by Google in the Bourbon update. (Eight year old general-interest content site, new content for a year in a diverse range of topics, no SEO tricks, 300 links in by Google, more than 1,000 listed by hotbot.com and other sites)

When I search site:www.example.com I get 6,100 pages. You can hit the "Next" button to surf through the first 1,000, that's it. So I don't know what's showing on the remaining 5,000 listed by the Big G...or how to surf those to see.

All of those first 1,000 pages show as www.example.com, etc. and NOT example.com. I've heard people have a problem with this. When I type in site:example.com, it also shows 6,100 pages, but they list the URL's as "www.etc"

The actual page count is about 1,900. Google had this wrong last summer or so, then correct for a while, back around Nov/Dec, maybe January. And then it started showing triple the pages again.

So, does anyone have any idea of why it may be showing as such? If Google is indexing duplicate pages of our site, how could I find these in a search, and what if anything should I do about it?

Thanks for your help.
Hunter

[edited by: ciml at 4:14 pm (utc) on May 24, 2005]
[edit reason] Examplified [/edit]

g1smd

11:42 pm on May 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure you set up a 301 redirect, and it must be a 301 redirect, not a 302, from non-www to www URLs for every page of the site. It will take Google a couple of months to sort your listings out. Ranking and PR may come later still.

I fixed a friends site in mid March so that everything redirected to non-www (the opposite of how I normally set things up) and so that all internal links ended in a trailing / on the URL (every page is an index page in a folder). It took Google 6 weeks to drop all the non-required pages and list only those without a www and with a trailing / on the URL. A week later (about 2 weeks ago) it started adding the other URLs back in (as URL-only listings), just a few at a time, every few days. Three days ago, it suddenly added ALL of the four variations of the URL back into the index, and did so for every one of the 116 pages of the site.

I think that if Google has fixed the 302 problem (and I don't see any evidence that the problem is fixed) then they have done it by totally destroying some part of the algo that deals with 301 redirects.

Reid

4:18 am on May 25, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



use a header checker and make sure that you are not getting a 302 from www to non-www (or vica versa)
must be 301

besides that:
If you use a .php script or similar that generates pages or temporary URL's google will index every one of them as a page. id=2 id=3 ect ect
Anything that can generate url's.

Do you see anything funny in the first 1000 pages?

I'm not sure but maybe you can use boolean search to filter out folders ect to see what's beyond the 1000

TravelSite

8:55 am on May 25, 2005 (gmt 0)

10+ Year Member



Hunter,

Just look at the pages Google is listing for your site. You can easily see all 6000 pages by using a combination of negative (and positive) terms.

Start by adding a word that appears in roughly 50% of your pages - e.g. "contact us" "specials" "welcome" "click here" (find by trial and error).

Then do...
"site:yoursite.com -chosenword" to give 3000 pages
and
"site:yoursite.com chosenword" to give the other 3000 pages

Then repeat the step e.g.
"site:yoursite.com -chosenword -differentchosenword"
"site:yoursite.com -chosenword differentchosenword"
"site:yoursite.com chosenword -differentchosenword"
"site:yoursite.com chosenword differentchosenword"
..to give roughly 1,500 pages for each

Using this method you will eventually be able to see all pages listed.

Reid

8:43 am on May 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



how would this work?

"allinurl:yoursite.com -chosendirectory"