Forum Moderators: phranque

Message Too Old, No Replies

How many pages estimated on the Internet?

If Google is indexing over 4 billion pages, how many pages are estimated?

         

hazardtomyself

8:44 pm on Apr 9, 2004 (gmt 0)

10+ Year Member



I have searched the web and WW. I am trying to find a reputable source for an estimation of pages on the Internet today.

pmkpmk

9:24 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd say there is a googleplex of pages :-)

OK, just kidding. Netcraft [news.netcraft.com] does a very renowned survey on the number of servers on the internet. How many pages each server delivers is hard to tell, especially when you take the increasingly popular dynamically generated pages into account.

claus

9:43 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The Netcraft site mentions 49,261,052 web sites(*) - set the average number of pages per site to 100 and Google lacks a billion, or around 700 million pages.

(*) "Search Web by Domain"

pmkpmk

9:47 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I strongly doubt that you can set the average to 100 pages per server. There's monoliths like microsoft.com which has 644.000 pages listed in Google, but there's also specialty sites with only a handful of pages or private homepages with only 1 page.

I would assume the average to be around the 10-20 page number.

But again, giving the increasing "dynamicity" of modern websites, one is more inclined to start a discussion on "WHAT is a webpage"?

claus

10:30 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> private homepages with only 1 page

The Netcraft study is for individual domains and subdomains:

1) Microsoft.com (G: 644,000) is listed with 360 sites (microsoft.com and subdomains hereof)

2) Bbc.co.uk (G: 1,350,000) is listed with 95 sites (bbc.co.uk and subdomains hereof)

3) Tripod.com (G: 1,790,000) is listed with more than 500 sites (as they provide subdomains for their users).

4) Aol.com (G: 874,000) is listed with more than 500 sites as well.

By the way, i recall that i have done that bbc search before and got a much larger number of pages... here it is (msg #6): [webmasterworld.com...]

October 12, 2003: "bbc site:bbc.co.uk" returned 3,1 million pages, now it's down to 823,000. I don't suppose they've deleted two thirds of their pages since October, so Google must have dropped a lot of them from the index.

SlowMove

10:39 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone that does a lot of spidering could probably give you a good estimate of the average number of pages per site. I wouldn't count on using something like G's "site:www.domain.com" because Google often won't crawl through all the pages of larger sites unless there is sufficient link popularity. A lot of pages are not getting spidered by the major search engines.

claus

11:07 pm on Apr 9, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Google often won't crawl through all the pages of larger sites unless there is sufficient link popularity

And/or high PR. The above examples are all PR9 except tripod that redirects to a lycos subdomain (it's "only" PR8). Recent threads seem to indicate that the googlebot still does go deep, but the deep pages don't always make it to the index.

That BBC would lose around 2 million indexed pages in six months tells me that google indexes a smaller percentage of large sites now than it did before.

Still, the google.com homepage statistic went from 3-something to 4-something billion pages during this period, so if this is not due to deeper spidering it must be broader spidering. So, what's new - pages with eastern character sets? more blogs? Froogle catalog pages? News sources? email lists? forums? dynamic pages? I don't know.

>> A lot of pages are not getting spidered by the major search engines.

If that 66% gap (assuming that bbc.co.uk is no larger than 3,1 million pages) is valid across sites of different size, then google would index around one third of the pages available, which sets the total figure to around 12-15 billion pages.

Personally i think the Google index of 4,285,199,774 holds no more than 10-20% of the total number of individual pages available, but your guess is as good as mine.

Added: Some percentage, perhaps a large one, of the pages currently not indexed will be duplicates or near-duplicates due to session ID's, customization, and the like - another percentage will be password-protected pages, eg. dating sites

sidyadav

2:01 am on Apr 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



According to Whois.sc, there are approximately 38,115,793 [Whois.sc] active domains and 2,069,188,504 [Whois.sc] IPs in the World.

If we take the 38,115,793 active domains and divide it into the 2,069,188,504 IPs, it equals to 54 - so lets keep that as the total pages per site (even though I don't know how I got it.).

Now taking the 38,115,793, if we take out a quarter (due to quarter of them being "duplicatish" domains), it equals 28,586,845. Taking the 28,586,845, if we use my theory and multiply it into 54, it will equal 1,543,689,630. Taking that 1,543,689,630, we could also take out a quarter (due to duplicate web pages), and it will equal 1,157,767,222.

So according to my theory, there are approximately 1,157,767,222 "unique" web pages in the world, even though this may not be a tad closer to the "actual" amount which only the god knows.

Sid
PS; I'm no Maths geek - I just wanted to show-off ;)

pmkpmk

6:52 pm on Apr 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



claus:
So, what's new - pages with eastern character sets? more blogs? Froogle catalog pages? News sources? email lists? forums? dynamic pages? I don't know.

I've certainly seen an immense increase in eBay-redirect-pages. There seems to be almost ALL remotely popular keywords covered! Only today I tried to estimate the value of an old Denon amplifier by searching the web for its specs. Doing so, I stumbled across slightly less than 20(!) linklist-to-eBay "sites".

Also Kelkoo-and-alike sites have certainly increased in numbers.

sidyadav:

According to Whois.sc, there are approximately 38,115,793 active domains and 2,069,188,504 IPs in the World.

Small mistake. In domains, they only count COM/NET/ORG/INFO/BIZ/US domains - all the country specific domains like .uk, .de, .nz, ... are NOT counted.

The IP-List however DOES list the countries as well!

claus

10:28 pm on Apr 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> 2,069,188,504 IPs in the World

Ah, then let's assume that there are only 20 pages per IP. That's probaly a low figure. This assumption would imply:

20 x 2,069,188,504 = 41,383,770,080 pages in total.

This means that Google has indexed around 10% of the available pages.

sidyadav

11:11 pm on Apr 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, but the question is, how many of those IPs are from ISPs - and don't host sites?

Sid

claus

12:36 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's a good point Sid. The low number of pages per IP was ment to take that sort of thing into account, so the figure should be higher if you remove the non-web related IP's. For web-serving IP's, muliti-hosting (shared IP) seems to be the rule and dedicated IP the exception. Also, sites on dedicated IP's should tend to be larger on the average than sites on shared IPs.

Anyway, this seems like a nice thing to do on a sunday morning:

1) Lets assume that 80% are for email, ftp, dialup, and various ISP-stuff. That figure might be too high, but it leaves us with:

(a): 0.20 x 2,069,188,504 = 413,837,701 web-page holding IP's (shared and dedicated)

  • Let's then assume that 20% of these IPs are dedicated IP's (for one customer, not necessarily one site) - and that 80% is for shared hosting.
  • Let's also assume that the really small sites tend to be on shared IPs and the really big ones tend to be on dedicated ones.
  • For simplicity in calculation, let's just say that beause of this, the average number of pages per dedicated IP is double the number of average pages per shared IP
  • Then, let's work with low numbers, as not to over-estimate the amount of pages. As we have now removed the non-web content it would be fair to assume that the average number of pages per IP is greater than 20. (remember that it is an IP, not a web site)
  • Lets work with two scenarios: A pessimistic one (very low figures) and a not-so pessimistic one (low figures). Let's just try to establish a baseline and be content to declare that "the real figure is very probably higher".

2) Let's calculate the estimated number of shared IP's and dedicated ones first:

(b): 0.80 x 413,837,701 = 331,070,161 shared web page holding IP's
(c): 0.20 x 413,837,701 = 82,767,540 dedicated web page holding IP's

3) So, how many web pages per IP on shared IP's? First, let's recalculate our 20 pages from before, to see what that would correspond to, as we have removed 80% of the IP's in step one:

(d): (20/20) x 100 = 100 pages on average per "web IP" (when 80% of IP's are not hosting web pages).

That still seems like a fair number to me, ie. not too high. It corresponds to around 85 pages per shared IP and around 170 pages per dedicated IP given the assumptions of double size (*).

But, let's be careful. Let's say:

  • Case 1: 50 pages per shared IP
  • Case 2: 100 pages per shared IP

4) Get that calculator of yours. Nevermind, i got it already:

  • Case 1 (very low estimate):
    (e): (331,070,161 x 50) + (82,767,540 x 100) = 24,830,262,050 web pages total
  • Case 2 (low estimate):
    (f): (331,070,161 x 100) + (82,767,540 x 200) = 49,660,524,100 web pages total

And, "the real figure is very probably higher" - but i definitely don't think it's lower than case 1.

5) Solving for the Google index of 4,285,199,774 pages:

  • Case 1: 4,285,199,774 / 24,830,262,050 = 17%
  • Case 2: 4,285,199,774 / 49,660,524,100 = 9%

So, i still think that Google has no more than 10-20% of the available pages indexed.

(If you find errors in the above, blame me)


(*) Note: Solve for x: ((x*80)+(2x*20))/100 = 100. The answer is 83.3333

sidyadav

2:24 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow, excellent theory claus! - really got my brain working.

But I think the best and the most accurate way to calculate this, is to buy a good server, a great server actually, which is on a high-speed internet connection and has a decent amount of space, install Larbin, leave the computer on for lets say 2 or 3 months.

Check then, and you'll find the real amount :)

You could actually do that and also make a website about it, name it the "Internet Size-Calculation Project (ISCP)" and then, when've got the information - sell the results to whoever wants it and also take donations through PayPal - whoever donates (during the buying/crawling period) more than $100 gets the info for free ;)

Not a bad idea to earn money I guess, also, when you've done it:
a) give me credit: As the guy who gave you the crazy idea to download the web.
b) share the info to me for free - as to giving you the idea ;)

hehehe... like this will ever happen.. ;)

Sid

mack

2:38 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



ahhh but what about the servers who don't allow robots..

I think this probably can be done by gathering data from isp's. use surfers as the spiders and log pages, if it's already recorded drop it, if not add it. They can then gather all the different data as one source.

Then the problem is, there are probably millions of pages out there that never get any visits.

Mack.

blaze

2:46 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The definition of a web page is always shaky. For example, if there was a database driven dictionary online and each entry had its own web page, would you count that?

Google has a caching web page for each of the web pages. Do you count all those?

What about email .. if a user can click and view all of his emails, each as their own web page, do you count all those?

The deep web becomes a very very big place when you start to think about each web page as a database entry.

For example, you could probably take usenet alone is a few billion pages when you count the fact that it is mirrored in a lot of different places.

mack

3:01 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I think the answer is infinative.

Think of the google website... enter a query and you get a results "page"... how many possible queries are they. Then think of all the sites with dynamic content.

Mack.

pmkpmk

6:58 pm on Apr 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And if somebody really finds out the exact number of pages on the web, it will vanish instantly and be replaced by something even more bizarre...

Take care, Douglas, wherever you might be at the moment...

mixd

2:05 am on Apr 12, 2004 (gmt 0)

10+ Year Member



[google.com...]
+
[google.com...]
+
[google.com...]
+
[google.com...]
+
[google.com...]
+
[google.com...]

Add a billion to that, there's my best guess. :)

victor

8:47 am on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Several more than a billion, I'd guess.
  • Don't forget usenet.
  • Don't forget all the pages you can see if you have the right passwords.
  • Don't forget Archive.org which is all the pages you could have seen if you'd looked at the time, and you can still see them now.
  • Don't forget all the pages that Google doesn't index (opinions vary, but low PR and zillions of pages gets only a subset indexed)
  • Don't forget all the pages from which Google is banned (robots.txt and Nofollow/NoIndex)
  • Don't forget all the other pages that Google hasn't indexed yet (assume a growth rate for the web and estimate the number of pages added in the last month or so).
  • etc.
  • mixd

    4:03 am on Apr 13, 2004 (gmt 0)

    10+ Year Member



    I think the real answer to this post is that this question is quite purposeless. And even if you knew the answer it would change before you could tell anyone.

    hazardtomyself

    5:36 am on Apr 13, 2004 (gmt 0)

    10+ Year Member



    Claus
    I appreciate the intellect that has gone into your evaluation...Off the charts. Thank you.

    >>I think the real answer to this post is that this question is quite purposeless<<

    On the contrary. When selling an SEO service or a type of website that is connecting buyers with sellers, informing the client or potential client that they are up against billions or conservatively "24,830,262,050" pages, all of a sudden having first page exposure becomes a phenominal value.

    I know, I know...but there are only 607,800 results in your query. That's not the point. The point is out of easily over 24 billion web pages on the Internet, I made the world find yours. That's a great value proposition. That's powerful marketing!

    That's the point, for me, of this thread.

    decaff

    5:41 am on Apr 13, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The surface web probably in the neighborhood of 20 billion...in the deep web over 900 billion pages and counting rapidly