I am interested in discussing how one would go about finding the total number of pages your site has when every page is dynamic. Using a link checker like Xenu relies on every page of your site having a link to it, which I believe is the case, but I can only guess when I don't know the actual page inventory.
The challenge with using a log analyzer is that entries into the logs are dependant upon the page being requested, if pages on my site receive no traffic (human or bot) these pages would not be able to be counted.
I am curious if anyone else has had this problem and what you did to solve it?
This is something that only a site engineer can solve properly, by generating tables for the various parameters available to each template. Then you'd have to do some multiplying to get an estimate of possible pages, if parameters apply in combinations. You'll still have issues if some combinations produce a null page.
Any tracking procedure will as you say get only pages that have been requested.
A link follower will not be able to deal with dropdown lists (I could be wrong, if so I want to know what link tracker will do it.)
My concerns about an estimate is that I could be 10-15% off and this could have huge implications on a financial model you build on top of this number.
On top of the financial model it would be very valuable for me to know the actual number of pages on my site as this will help show how well search engines are indexing us.
I am surely not with the only organization who has had this problem as I can only imagine that most larger companies suffer from this issue. Specifically if you have blogs, community, and directory and thus page growth is occuring from users as apposed to a centralized development cycle.
At this point I am open for any kind of an idea, even brain storming.
If page growth is coming from the outside, as in blogs or community stuff, then those pages will have been requested, right? And logs will then be recording them as new, unique URLs. Feeding raw logs to a database tool or script should give you pretty good counts and you won't need an analytics tool.
But doesn't your CMS generate page counts for you? Most do.