A way to find out how many pages of your site is in the Google index!

Forum Moderators: open

Message Too Old, No Replies

A way to find out how many pages of your site is in the Google index!

Google will tell you!

PFOnline

10:17 pm on Apr 3, 2003 (gmt 0)

Hi, not sure if this anything new or if it has been discussed before, but I think I remember hearing questions of people wondering if there is any way to find out an exact number of how many pages of there site was in the Google index.

The way I found it was by going to: [services.google.com...]

Then enter your domain under the Free WebSearch plus SiteSearch (first option) then clicking "Continue"...

On the next page, Google tells you how many of this sites pages are in the Google index.

Not sure if this is new, but I thought it was pretty neat to find that 330 pages of my site are in the Google index, considering we only have about 330 pages! :)

Clovis

10:26 pm on Apr 3, 2003 (gmt 0)

handy, thanks

bcolflesh

10:37 pm on Apr 3, 2003 (gmt 0)

They should list this when you do a www.widgets.com search under the "Google can show you the following information for this URL:" header.

Thanks for posting it!

Regards,
Brent

defanjos

10:43 pm on Apr 3, 2003 (gmt 0)

disregard :)

John_Caius

10:53 pm on Apr 3, 2003 (gmt 0)

allinurl:www.yourdomain.com

PFOnline

11:01 pm on Apr 3, 2003 (gmt 0)

Ahh, I guess the "allinurl" way will work too, hehe, nevermind then! just thought I stumbled across something possibly special! ;)

WebGuerrilla

11:04 pm on Apr 3, 2003 (gmt 0)

The allinurl: search isn't real accurate. For a straight search of

allinurl:www.domain.com

the www gets ignored, so you end up with all urls that conatain a part of the domain name.

If you put the domain in quotes like allinurl:"www.domain.com: it works a little better, but you end up getting all the pages that are running redirection urls also.

The best way is to to a negative search on a word that you know won't be on your site.

Something like:

-wxyz site:www.domain.com

PFOnline

11:53 pm on Apr 3, 2003 (gmt 0)

All 3:

allinurl:www.domain.com

site:www.domain.com -wxyz

and the [services.google.com...] way gave me the same result for my site, 330 pages in the Google Index. :)

This brings up another question, was anyone surprised to see how many pages of there site was in the index?

Like I said earlier, our site only has around 330 pages, so I was definitley pleasantly surprised to find that most of them are searchable in Google! :)

killroy

12:05 am on Apr 4, 2003 (gmt 0)

What about:
allinurl:sitename site:www.sitename.com
?

Unfortunately all of these give me different results :(

[services.google.com...]
- approximately 73000 pages

-wxyzwxyzwxyxyz site:sitename.com
- 68,200

allinurl:sitename site:www.sitename.com
- 65,800

allinurl:"www.sitename.com"
- 81,200

Now which one is right?

PFOnline

12:13 am on Apr 4, 2003 (gmt 0)

First off, that's a massive amount of pages you have in the index, congrats!

Secondly, that is interesting that you get 3 different results using the different methods, I get the same for all 3. I'm not sure... I'd be interested to find this out, too...

killroy

12:19 am on Apr 4, 2003 (gmt 0)

Especially the allinurl only search is interesting (the one that's highest).

it's hard to say what goes on, since google groups the results by domain, so in the "show related results" I cna't see what comes after my pages. but with out related results showing, I see two of my own pages, then a page on a domain, which has a strange address includign a long complex address form my site verbatim, with script name and all. no? just slashes. I wonder what they are up to and where that came from.

S. N.

PS it happened to me before that my script screwed up and send off google on a wierd mangled url including script code which ir promptly followed... In fact I linked a new domain yesterday, and it was hit 1400 times by googlebot and scooter already...

Jesse_Smith

12:57 am on Apr 4, 2003 (gmt 0)

site:domain.com domain.com has always worked for me.

killroy

1:06 am on Apr 4, 2003 (gmt 0)

That gives me 67,100 ni my case ;) yet another different count.

BigDave

1:17 am on Apr 4, 2003 (gmt 0)

Killroy,

You just have too many pages. You will notice that the number they give you is "about xxxx" so they are making a very quick estimate.

I think they are extremely accurate for anything under 500 pages. But the maximum number of results they will give a search are 1000, so it doesn't really matter if their estimate is a few percent off with really large sites. And they never give you more than 3 significant digits.

huppy99

2:23 am on Apr 4, 2003 (gmt 0)

hmmm..

[services.google.com...] :70,500 pages

-wxyzwxyzwxyxyz site:sitename.com : 141,000 (we have some subdomains)

-wxyzwxyzwxyxyz site:www.sitename.com : 61,000

allinurl:sitename site:www.sitename.com : 65,900

allinurl:"www.sitename.com" : 61,000

Jesse_Smith

3:54 am on Apr 4, 2003 (gmt 0)

Here's what I get for two of my sites.

anything site:sitename.com
13,200 and 2,130

allinurl:sitename site:www.sitename.com
13,200 and 3,490

allinurl:"www.sitename.com"
13,300 and 3,490

site:sitename.com "sitename.com"
13,500 and 3,560

huppy99

1:01 pm on Apr 4, 2003 (gmt 0)

I tend to the use the cobrand page to check how many pages we have indexed.

I have seen our pages indexed count grow from aboutr 13,000 up to 60,000 odd with the addition of new content and services, and new content relationships with other sites.

Personally, it doesn't seem to matter a lot, as the PR for our site dropped a point last update!

There are too many factors relating to the traffic that you attract to worry about how many pages you have in the index, and these measurements seem to be a quick estimation anyway for sites with larger page numbers.

We do have a new site, which as of rightnow has 102 pages in the index. possibly becuase of its 'tiein' with our publishing network, it gets the freshbot every day, even checking older articles, and is a PR5 from day one.

The reason I DO check pages indexed for our larger sites is simply to get an idea of how many pages are 'crawlable', and whether the figure moves up or down by a large amount after each update. This month I think I can be happ with 60,000 pages. I might spend some time optimising them all!

HenryUK

1:13 pm on Apr 4, 2003 (gmt 0)

The best way is to to a negative search on a word that you know won't be on your site.
Something like:
-wxyz site:www.domain.com

Of course, once this thread is indexed, that negative search won't work accurately for this site...

;-)

killroy

2:22 pm on Apr 4, 2003 (gmt 0)

Hehe ralated to Google whacking I supose ;) that'S why I do loooong garbage words for the negative search... don't rely on the "popular" garbage words ;)

I used to like it a few months ago where you could do a search for site:www.sitename.com a and it would say "a" is a common word so it was removed form the query, but it still did the search... I doesn't do that anymore.. don't really know why, since getting all pages fro a site was the only use for that and a legitimate one I think.

ALbino

12:10 am on Apr 5, 2003 (gmt 0)

I receive different results as well. 7,330 using the 'new' method and 7,380 using allinurl:"www.sitename.com". Maybe because it's crawled some new content pages since the new thing last cached it? Heck if I know :)

AL.

1Lit

4:19 am on Apr 5, 2003 (gmt 0)

You guys were getting me really jealous with your thousands of webpages listed in Google, so here are Lycos's results to get you back:

"1,290,000"

huppy99

7:47 am on Apr 5, 2003 (gmt 0)

.. now I feel small and insignificant again!

thanks! :)

Still, in Australia, we are miles from the top of the heap.
But always growing.

uber_boy

3:21 pm on Apr 24, 2003 (gmt 0)

Until I discovered this thread, the manner by which I'd been measuring the number of pages in the google index was as follows:

site:www.getcited.org sitename

Since I have my site's name on every page, I figured this would work just fine. However, reading this thread led me to try a negative search and, when I did so, I was shocked to see that the result was approximately half of the "positive" search result. Thus, I'm now wondering how this could be. Can anyone shed some light on this?