Forum Moderators: open
The way I found it was by going to: [services.google.com...]
Then enter your domain under the Free WebSearch plus SiteSearch (first option) then clicking "Continue"...
On the next page, Google tells you how many of this sites pages are in the Google index.
Not sure if this is new, but I thought it was pretty neat to find that 330 pages of my site are in the Google index, considering we only have about 330 pages! :)
allinurl:www.domain.com
the www gets ignored, so you end up with all urls that conatain a part of the domain name.
If you put the domain in quotes like allinurl:"www.domain.com: it works a little better, but you end up getting all the pages that are running redirection urls also.
The best way is to to a negative search on a word that you know won't be on your site.
Something like:
-wxyz site:www.domain.com
allinurl:www.domain.com
site:www.domain.com -wxyz
and the [services.google.com...] way gave me the same result for my site, 330 pages in the Google Index. :)
This brings up another question, was anyone surprised to see how many pages of there site was in the index?
Like I said earlier, our site only has around 330 pages, so I was definitley pleasantly surprised to find that most of them are searchable in Google! :)
Unfortunately all of these give me different results :(
[services.google.com...]
- approximately 73000 pages
-wxyzwxyzwxyxyz site:sitename.com
- 68,200
allinurl:sitename site:www.sitename.com
- 65,800
allinurl:"www.sitename.com"
- 81,200
Now which one is right?
it's hard to say what goes on, since google groups the results by domain, so in the "show related results" I cna't see what comes after my pages. but with out related results showing, I see two of my own pages, then a page on a domain, which has a strange address includign a long complex address form my site verbatim, with script name and all. no? just slashes. I wonder what they are up to and where that came from.
S. N.
PS it happened to me before that my script screwed up and send off google on a wierd mangled url including script code which ir promptly followed... In fact I linked a new domain yesterday, and it was hit 1400 times by googlebot and scooter already...
You just have too many pages. You will notice that the number they give you is "about xxxx" so they are making a very quick estimate.
I think they are extremely accurate for anything under 500 pages. But the maximum number of results they will give a search are 1000, so it doesn't really matter if their estimate is a few percent off with really large sites. And they never give you more than 3 significant digits.
[services.google.com...] :70,500 pages
-wxyzwxyzwxyxyz site:sitename.com : 141,000 (we have some subdomains)
-wxyzwxyzwxyxyz site:www.sitename.com : 61,000
allinurl:sitename site:www.sitename.com : 65,900
allinurl:"www.sitename.com" : 61,000
I have seen our pages indexed count grow from aboutr 13,000 up to 60,000 odd with the addition of new content and services, and new content relationships with other sites.
Personally, it doesn't seem to matter a lot, as the PR for our site dropped a point last update!
There are too many factors relating to the traffic that you attract to worry about how many pages you have in the index, and these measurements seem to be a quick estimation anyway for sites with larger page numbers.
We do have a new site, which as of rightnow has 102 pages in the index. possibly becuase of its 'tiein' with our publishing network, it gets the freshbot every day, even checking older articles, and is a PR5 from day one.
The reason I DO check pages indexed for our larger sites is simply to get an idea of how many pages are 'crawlable', and whether the figure moves up or down by a large amount after each update. This month I think I can be happ with 60,000 pages. I might spend some time optimising them all!
I used to like it a few months ago where you could do a search for site:www.sitename.com a and it would say "a" is a common word so it was removed form the query, but it still did the search... I doesn't do that anymore.. don't really know why, since getting all pages fro a site was the only use for that and a legitimate one I think.
site:www.getcited.org sitename
Since I have my site's name on every page, I figured this would work just fine. However, reading this thread led me to try a negative search and, when I did so, I was shocked to see that the result was approximately half of the "positive" search result. Thus, I'm now wondering how this could be. Can anyone shed some light on this?