Forum Moderators: open

Message Too Old, No Replies

theory on dynamic pages

why i believe that they don't get PR at all (even if indexed)

         

muesli

3:56 pm on Aug 2, 2002 (gmt 0)

10+ Year Member



hi,

there were some controversal opinions on this board about dynamic links and how G treats them. i admit that my level of SEO knowhow deserves the label "dangerous halfway wisdom" so please don't be too harsh with me.

the facts (hope everyone agrees):

  • many SE don't index dynamic pages (i.e. pages with a "?" in their URL) because of the danger that the spider/bot gets trapped in a loop (caused by session-IDs in URLs)
  • G does index (some) dynamic pages
  • dynamic pages don't seem to make it high up in G rankings
  • dynamic pages seem to be indexed less likely

    the theory most people here seem to take:

  • G doesn't "like" dynamic pages
  • G decides based on PR (probably meant: the linking page's PR) if it indexes or not
  • dynamic pages get "punished", so they get less PR than they would if they were static

    i strongly disagree. my theory:

  • G trys to index whatever they can, regardless of URL formatting (how less could they care!). they are limited by technical issues (risk of getting trapped) and feasability (to keep the web indexable they probably - everyone says, don't know myself - have to limit the crawling depth for sites)
  • to avoid getting trapped G indexes dynamic pages but doesn't follow any links on them. this way only pages are indexed G can find via a link on a static page.
  • as ignoring links would totally change a closed system's PR structure (eg. dynamic pages don't get the chance to pass back PR to their frontpage) google simply doesn't calculate any PR for dynamic pages. dynamic pages from a PR perspective are being totally ignored.

    this would explain why dynamic pages are less likely to be indexed (G doesn't find those that cannot be reached via static pages) and why dynamic pages get worse ranking (the page has to fight its way up the ranking only by means of keyword density and such, without the help of PR).

    i think the commom misunderstanding comes from the guess the google toolbar does on dynamic pages. such toolbar PR is supposedly based on directory depth in combination with the site's frontpage's PR. no real PR at all!

    conclusion (my personal opinion):

  • if you want to use dynamic looking pages you have to make sure that they are reachable by static ones.
  • you also have to be aware that these pages will never contribute to the PR within your site. if your only static page is the frontpage but you have thousands of dynamic subpages all linking back to it you won't be able to benefit from their "votes".
  • so try everything to get rid of the question mark in your URLs. (at least same conclusion as everyone else;-)

    however i did not try to verify (or speaking with karl popper: falsify) this theory. it would probably be quite easy to set up a test site and submit it to G to find out if the bot follows the links. finding out if the pages have real PR or not is probably already more difficult.

    muesli

  • guezo2

    4:22 pm on Aug 2, 2002 (gmt 0)

    10+ Year Member



    I disagree with you, muesli, when you say "G indexes dynamic pages but doesn't follow any links on them".
    I have some dynamic pages [A1..An] containing links to other dynamic pages [B1..Bn]. These links do not exist anywhere else in my site.
    Pages [B1..Bn] are fully indexed by Google.
    The same process make links from [B1..Bn] pages to [C1..Cn] but those ones are not indexed.
    All these pages {Ak, Bk, Ck} are processed within the same PHP file, the only differences are the number of "&" in the URL.
    My conclusion is that "sometimes" Google follows the links on dynamic pages. I don't know it the condition is to have a great PR or to reduce the number of "&" in the URL...

    Grumpus

    4:27 pm on Aug 2, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    You've got some very good points! Let's look at a few that are definitely incorrect, though...

    many SE don't index dynamic pages

    Nah. Only a few, don't. Most do just fine with mine.

    dynamic pages seem to be indexed less likely

    It's more like, dynamic pages get indexed less deeply. And, I'm not certain it has as much to do with being dynamic (i.e. with a ? in the URL) but that there's an extra 1-3 seconds between when the request goes out from the bot and when the first response comes. The bot's got a limited amount of time, so if it hits 1000 pages on my dynamic site, it could hit 3000 pages on another site due to the extra 1000-3000 seconds it takes to fetch the pages on my dynamic site. I believe that PR also affects some sort of "clock" that google has. My site is PR5 so it's got a 100 hour limit. The other site has PR8 so it gets a 300 hour limit. (I made up those numbers, but you know what I mean...)

    dynamic pages don't seem to make it high up in G rankings

    Not really. I have a PR5 and my top competition (a dynamic site with a four letter domain name) has a PR8 or 9 depending upon the server you access it on.

    dynamic pages get "punished", so they get less PR than they would if they were static

    I don't see that, either. I have 46 inbound links. PR5. My design site has 37 inbound links. PR5. My design site is static. Inbound links are relatively equal, as is PR.

    ...G indexes dynamic pages but doesn't follow any links on them...

    Nah, I've got 17K pages in the index. 90% of those aren't linked anywhere on the web. Google found 'em by going through my homepage and following links.

    google simply doesn't calculate any PR for dynamic pages

    Now THAT is a good possibiliy. I've thought of that. It also seems to me that it might not be able to effectively tell the difference between page.asp?ID=1 and page.asp?ID=47982 which means it's just going to boggle and forget the whole calculation.

    of dynamic subpages all linking back to it you won't be able to benefit from their "votes".

    You used to get credit for them. The first month I was in the index (March) I had credit for about 20K inbound links (mostly from my internal pages). The next month, they all went away. I'm not so certain that this was done as a matter of "dynamic" vs. "static", though. I suspect that large sites don't get internal link benefits simply because it would greatly unbalance the whole scale. Look at it this way (and again, static or dynamic wouldn't matter)...

    I've got roughly 1 million pages on my site and this month it's PR2, so Google comes and grabs 10K pages from it. Now, I've got 10K more "link to's" next month, enough to move me to PR3. As PR4, google comes and gets 20K pages, so next month, I've got an additional 10K inbound links. Enough to move me to PR5. Etc. You see how that would go, right? Every site with lots of pages would eventually make it to PR9 or PR10 (with very high crawl depth) simply because they had lots of links to themselves.

    so try everything to get rid of the question mark in your URLs.

    I doubt it'd help all that much in the long run. If they are truly dynamic, no matter how static they look, there's still a clock running and those extra seconds to fetch the page all add up.

    By doing an actual conversion to static, you lose what makes a dynamic site interesting - it's alive. Add data, and the site changes. Google LOVES fresh.

    I could possibly see some PR benefits deep, but as I've mentioned before in another topic, my homepage is getting "link-to" credit for a DMOZ link going to another page deeper in the tree. Based on that alone, while the deep page may not have any real PR, my homepage MUST be getting the benefit from it in one way or another.

    Finally, I've got my own clock. If I was to write a routine to dump the data from my database into static pages, it'd probably finish up sometime in the middle of the September crawl. By the time it got into the Google directory, it'd be 2 months old instead of 1.

    You do bring up some interesting points, though. I really like the "no real page rank" for dynamic pages theory! It makes a lot of sense.

    G.

    Dinkar

    4:40 pm on Aug 2, 2002 (gmt 0)

    10+ Year Member



    muesli,

    G follow links on dynamic pages but there is some limitation because of some technical reason. G also calculate PR for dynamic pages and a site can get PR benifit from them if that site has a link from that dynamic page.

    I think you won't believe if I told you that Google see a redirecting link (Redirect.asp?target=www%2Esite%2Ecom) to a site as a back link for that site.

    muesli

    5:31 pm on Aug 2, 2002 (gmt 0)

    10+ Year Member



    well,

    i still favour the theory that the bot has to put at a disadvantage dynamic pages for technical reasons and therefor they are excluded from PR to avoid bringing the system out of balance. i just seems that the parametres seem to be more complex than i thought.

    so step by step.

    (1) it seems to be agreed that dynamic pages are treated differently to static ones, though the opinions on what the exact differences are differ.

    as the differences are probably the key to explain the effects (given that google technology is rational which i do suppose) we should collect all observations. so far i saw:

  • "dynamic pages get less PR". (as real PR is not 100% distinguisable from toolbar-PR we should maybe re-word this to "dynamic pages get worse ranking".)
  • "dynamic pages are indexed less likely" or "get indexed less deeply"
  • "a longer parameter-section and/or a higher number of questionmarks maybe plays a role for the decision not to index a page"

    any other observations on differnt treatment by google and the bot?

    (2) three given reasons for the differences so far were (i refuse to take into account missing "sympathy" as we are talking about software, not a human):

  • the bot must avoid getting trapped
  • dynamic pages take longer to load therefor the bot indexes less pages on the site in the same time
  • G "might not be able to effectively tell the difference between page.asp?ID=1 and page.asp?ID=47982" (while i don't know how why a software wouldn't be able to see that difference)

    any other theories why G would make those unspecified differences?

    all i'm trying to do is to find out how to design and structure my site. i have no access to any logfile information (as the site produces 180mio+ page impressions a month, most of them dynamic, we decided to turn logging off to save ressources) so i must rely on theory and others' experiences.

    grumpus, i found yours very interesting, especially the point with the "google-clock". G evidently has to be extremely ressource-efficient, so it would make sense to punish slow servers by less indexing.

    i don't understand though why you believe the "no real PR for dynamic pages" theory is worth considering if you don't agree with the other points. what sense would it make then to ignore dynamic pages?

    an unrelated topics that i think is worth discussing came up in this thread:

    I suspect that large sites don't get internal link benefits simply because it would greatly unbalance the whole scale.(..) Every site with lots of pages would eventually make it to PR9 or PR10.

  • isn't it called "pagerank" in contrast to "siterank"?
  • and if, what would be a "site" then?
  • is www.domain.com a different site to www1.domain.com?
  • if not is user1.cjb.net and user2.cjb.net the same site, too?
  • might even bbc.co.uk and yahoo.co.uk be the same site (*.co.uk)?

    i can't believe that internal linking is not being credited. doesn't anyone have a tool or excel sheet where a huge site could be simulated to see if grumpus is right and it's frontpage reaches PR10?

    muesli

  • Rugles

    5:45 pm on Aug 2, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I have a couple thousand dynamic pages in the google database. Several dozen of these pages rank number 1 for their keywords and bring a boatload of traffic.
    I think that google aviods dynamic pages with long strings or too many parameters.

    Grumpus

    6:07 pm on Aug 2, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    isn't it called "pagerank" in contrast to "siterank"?

    I know, I'm sloppy with the terms. Depth of crawl seems to be calculated by the PR of the site's root/index page. Therefore, I tend to call it siterank, because there are certainly certain calculations that come from the rank of that front page, if only the depth/time allowed per crawl factor.

    As far as simulating the PR10 theory, there are variables which come into play. For example, I'm on a shared server and my pageload times are fine, but the app calls tend to be slower than if I'd spend the dough and get a dedicated server with gobs of ram and a lightning fast processor. I doubt it'd be possible for google to get every page from my site if it was to dedicate 10 IP's to me and have them hit me twice a minute all month long. It'd still only get 864,000 pages indexed. At that point, if I had 0 links from any other site, my index page would probably cap out at 8 or 9, regardless. I doubt even a PR10 site gets 10 bot threads running all month long. Though, as you can see, large sites would definitely get a nice kick if internal links counted.

    There's also the factor that Google's crawl depth overall varies from month to month.

    i don't understand though why you believe the "no real PR for dynamic pages" theory is worth considering if you don't agree with the other points. what sense would it make then to ignore dynamic pages?

    I think Google uses "guessed" PR in its own calculations. All of my main subject pages have the same PR in the toolbar. Look at it this way. If you've got a site, dynamic or not, how many links do you have going to your "links" page? Probably none. Therefore, it has no page rank except for the links going to it from within your site. If you've got 15 pages on your site and every one of them links to your links page, then you might be lucky enough to have a "real" PR of 1 or 2. Yet, your links page is "guessed" at one less than your homepage per "/" in the URL and that "guessed" value is what is passed along to the sites you link to. If that isn't a fact, then every one of those PR calculation tools is wrong.

    G.

    scareduck

    10:23 pm on Aug 2, 2002 (gmt 0)

    10+ Year Member



    Grumpus said:
    I'm not certain it has as much to do with being dynamic (i.e. with a ? in the URL)

    Our sites dropped "?" and went from hundreds of pages being indexed to millions. There's definitely a link between CGI-form URLs and poor Google page breadth.

    bcc1234

    8:06 am on Aug 3, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I've got roughly 1 million pages on my site and this month it's PR2, so Google comes and grabs 10K pages from it. Now, I've got 10K more "link to's" next month, enough to move me to PR3. As PR4, google comes and gets 20K pages, so next month, I've got an additional 10K inbound links. Enough to move me to PR5. Etc. You see how that would go, right? Every site with lots of pages would eventually make it to PR9 or PR10 (with very high crawl depth) simply because they had lots of links to themselves.

    And that's what makes PageRank concept great.
    If you drop the spam filters and other real-world biases - a site with 1M pages is most likely "better" than a site with 10 pages.

    bcc1234

    8:14 am on Aug 3, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I don't have much info to argue about dynamic vs static pages, but I submitted a new site with about 1K pages 2 months ago.

    At that time, only 7 pages looked static and the rest - dynamic.
    ....7 pages got indexed.

    I modified the site to make all pages look static - got about 700 pages crawled so far (it's still going). Don't know how many will actually get into the index.

    Some pages are 5 or 6 clicks away from the home page.

    The home page has PR 5 as of now.

    By making pages look static, I mean - creating a proxy app that converts something like X100M300.htm to page.jsp?par1=100&par2=300 and serves the request.

    Just my $0.02.

    Grumpus

    11:39 am on Aug 3, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    And that's what makes PageRank concept great.
    If you drop the spam filters and other real-world biases - a site with 1M pages is most likely "better" than a site with 10 pages.

    Ahh, but that's the point. You don't get that because big sites aren't getting credit for their internal links to themselves. Yahoo seems to get one for each subdomain, so that helps. IMDb has credit for 3 or 4 pages. I get credit for one. (My default.asp links to my / - which is also my default.asp)

    G.

    muesli

    12:40 pm on Aug 3, 2002 (gmt 0)

    10+ Year Member



    don't get that because big sites aren't getting credit for their internal links to themselves. Yahoo seems to get one for each subdomain, so that helps. IMDb has credit for 3 or 4 pages. I get credit for one.

    grumpus, (potentially stupid question) how do you measure what a page gets cerdit for and what not?

    muesli

    JohnKing1

    1:09 pm on Aug 3, 2002 (gmt 0)

    10+ Year Member



    My theory is that webmasters themselves are not willing to put permanent, well placed links to dynamic content on their sites, and that Google is simply mirroring this bias.

    As a webmaster I almost never put permanent links to .asp and .php pages, especially when they contain ? and & characters in the URL. Experience has taught me that these usually turn into missing pages within a few months.

    If you look at Yahoo, you will find that the majority of links that have been there for more than a year are to static web pages.

    If Google is running machine learning algorithms on their databases to help them rank pages (they have been advertising for positions), then this is one bias that surely has been noticed and included in the ranking algorithm.

    Most of the reliable, relevant, well placed links are to static pages. Thus these are the type of links that the Google ranking program assigns high PR to.

    I am not sure how long this will continue. In my opinion dynamic web languages and databases enable the webmaster to produce timely and relevant content in greater quantities that would have been possible by hand. These technologies just need to find a solution to the problem of shifting/disappearing content.

    [edited by: JohnKing1 at 1:26 pm (utc) on Aug. 3, 2002]

    Grumpus

    1:23 pm on Aug 3, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    how do you measure what a page gets cerdit for and what not?

    Check inbound links at Google. Obviously, some low ranked pages aren't going to show in that list, but it'll give you an idea. If it shows in that list, the page is getting credit.

    Experience has taught me that these usually turn into missing pages within a few months.

    That's an interesting point. This is quite true (usually) for news based sites, but I believe that if Google is making a generalization about this, then it's making a major mistake. Speaking for sites in my field (movies), I don't know of any that actually have content that vanishes over time. I'm sure this is almost certainly the case for most other "information" sites (as opposed to "breaking news" sites) as well.

    I agree that it makes sense that Google algos would feel this way, though. Of the high PR'd sites (erm, sites whose pages tend to get high PR), a good many of them are news sites (presumably because people link to the news). A good many of these purge stories once they lose relevance.

    But, I highly doubt many a webmaster says, "Drat, that URL has a ? in it so I'm not adding a link." They might likely say, "That's something that'll be there for a week and vanish" if that, in fact, is how the page looks.

    Interesting theory, and there's a foothold there for us to explore, but I ain't quite buying it, yet.

    G.

    muesli

    10:20 pm on Aug 6, 2002 (gmt 0)

    10+ Year Member



    hi grumpus,

    the user pageoneresults suggested that only links on PR4+ pages are being credited, not any from PR3 and below. (he deducted this from the fact that the search "link:www.domain.com" only yields PR4+ results.)

    if this is true it could be an answer to our "internal linking discussion. you say:

    I suspect that large sites don't get internal link benefits simply because it would greatly unbalance the whole scale.(..) Every site with lots of pages would eventually make it to PR9 or PR10.


    even huge sites usually haven't so many PR4+ pages, so all internal credit they get would be well deserved - and wouldn't get them PR10.

    muesli

    ps. is the "only PR4+" theory new or common sense and i just didn't yet stumble accross?

    Grumpus

    1:08 am on Aug 7, 2002 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The theory about not getting credit for links from pages less than PR4 isn't new. Only problem here is that on my site, there are at least a couple hundred pages that come up with PR4 in the toolbar, and I'm only getting credit from my default page linking to my default page.

    The answer lies somewhere in between these two theories, most likely.

    G.