How many pages on a web site?

Forum Moderators: phranque

Message Too Old, No Replies

How many pages on a web site?

Is there a way to tell?

kittykatt

7:41 pm on Mar 6, 2003 (gmt 0)

Forgive the newbie question, but is there a tool out there that will tell me how many pages is on a web site?

Thanks!

tedster

9:17 pm on Mar 6, 2003 (gmt 0)

It's a hard question to answer because there's no easy deifnition of "web site". In fact, there's some talk at the W3C about adding a site variable to the HTTP header spec just to clear that up.

Here's one issue - what appears to be a single "site" can consist of page elements drawn from more than one domain. Check out the URL for the images on this page, for a basic example.

I have one client whose website has become so entangled with their parent company's site that it's hard to say whether it's one site or two. And it's served from 6 different domains and three different servers in three different locations.

There are site grabber tools that will spider links beginning on a certain page and then save all the pages they find on the crawl. But they would miss orphaned pages, doorway pages that only link inbound, and so on.

kittykatt

9:39 pm on Mar 6, 2003 (gmt 0)

Could you tell me where I could find the site grabber tool you refer to?

That would give me a place to start. Thanks!

gopi

10:50 pm on Mar 6, 2003 (gmt 0)

You can use a very simple but crude method using google

Just type "site:<site.com> <some-common-word> " in google

The site.com is the site in question and the common word is some word/phrase which is the same in all pages ( like copyright notice )...

Ofcourse it will give only the pages indexed by google :)

gethan

10:56 pm on Mar 6, 2003 (gmt 0)

Alltheweb have some nice site investigation tools too:

(See tedster's caveats though - these are only the things that fast knows about)

WebmasterWorld investigated [alltheweb.com] - list number of pages as 114,000+

Or type in www.webmasterworld.com in the search and find out about the 42,000 incoming links :)