Forum Moderators: open
there were some controversal opinions on this board about dynamic links and how G treats them. i admit that my level of SEO knowhow deserves the label "dangerous halfway wisdom" so please don't be too harsh with me.
the facts (hope everyone agrees):
the theory most people here seem to take:
i strongly disagree. my theory:
this would explain why dynamic pages are less likely to be indexed (G doesn't find those that cannot be reached via static pages) and why dynamic pages get worse ranking (the page has to fight its way up the ranking only by means of keyword density and such, without the help of PR).
i think the commom misunderstanding comes from the guess the google toolbar does on dynamic pages. such toolbar PR is supposedly based on directory depth in combination with the site's frontpage's PR. no real PR at all!
conclusion (my personal opinion):
however i did not try to verify (or speaking with karl popper: falsify) this theory. it would probably be quite easy to set up a test site and submit it to G to find out if the bot follows the links. finding out if the pages have real PR or not is probably already more difficult.
muesli
many SE don't index dynamic pages
dynamic pages seem to be indexed less likely
dynamic pages don't seem to make it high up in G rankings
dynamic pages get "punished", so they get less PR than they would if they were static
...G indexes dynamic pages but doesn't follow any links on them...
google simply doesn't calculate any PR for dynamic pages
of dynamic subpages all linking back to it you won't be able to benefit from their "votes".
I've got roughly 1 million pages on my site and this month it's PR2, so Google comes and grabs 10K pages from it. Now, I've got 10K more "link to's" next month, enough to move me to PR3. As PR4, google comes and gets 20K pages, so next month, I've got an additional 10K inbound links. Enough to move me to PR5. Etc. You see how that would go, right? Every site with lots of pages would eventually make it to PR9 or PR10 (with very high crawl depth) simply because they had lots of links to themselves.
so try everything to get rid of the question mark in your URLs.
By doing an actual conversion to static, you lose what makes a dynamic site interesting - it's alive. Add data, and the site changes. Google LOVES fresh.
I could possibly see some PR benefits deep, but as I've mentioned before in another topic, my homepage is getting "link-to" credit for a DMOZ link going to another page deeper in the tree. Based on that alone, while the deep page may not have any real PR, my homepage MUST be getting the benefit from it in one way or another.
Finally, I've got my own clock. If I was to write a routine to dump the data from my database into static pages, it'd probably finish up sometime in the middle of the September crawl. By the time it got into the Google directory, it'd be 2 months old instead of 1.
You do bring up some interesting points, though. I really like the "no real page rank" for dynamic pages theory! It makes a lot of sense.
G.
G follow links on dynamic pages but there is some limitation because of some technical reason. G also calculate PR for dynamic pages and a site can get PR benifit from them if that site has a link from that dynamic page.
I think you won't believe if I told you that Google see a redirecting link (Redirect.asp?target=www%2Esite%2Ecom) to a site as a back link for that site.
i still favour the theory that the bot has to put at a disadvantage dynamic pages for technical reasons and therefor they are excluded from PR to avoid bringing the system out of balance. i just seems that the parametres seem to be more complex than i thought.
so step by step.
(1) it seems to be agreed that dynamic pages are treated differently to static ones, though the opinions on what the exact differences are differ.
as the differences are probably the key to explain the effects (given that google technology is rational which i do suppose) we should collect all observations. so far i saw:
any other observations on differnt treatment by google and the bot?
(2) three given reasons for the differences so far were (i refuse to take into account missing "sympathy" as we are talking about software, not a human):
any other theories why G would make those unspecified differences?
all i'm trying to do is to find out how to design and structure my site. i have no access to any logfile information (as the site produces 180mio+ page impressions a month, most of them dynamic, we decided to turn logging off to save ressources) so i must rely on theory and others' experiences.
grumpus, i found yours very interesting, especially the point with the "google-clock". G evidently has to be extremely ressource-efficient, so it would make sense to punish slow servers by less indexing.
i don't understand though why you believe the "no real PR for dynamic pages" theory is worth considering if you don't agree with the other points. what sense would it make then to ignore dynamic pages?
an unrelated topics that i think is worth discussing came up in this thread:
I suspect that large sites don't get internal link benefits simply because it would greatly unbalance the whole scale.(..) Every site with lots of pages would eventually make it to PR9 or PR10.
i can't believe that internal linking is not being credited. doesn't anyone have a tool or excel sheet where a huge site could be simulated to see if grumpus is right and it's frontpage reaches PR10?
muesli
isn't it called "pagerank" in contrast to "siterank"?
I know, I'm sloppy with the terms. Depth of crawl seems to be calculated by the PR of the site's root/index page. Therefore, I tend to call it siterank, because there are certainly certain calculations that come from the rank of that front page, if only the depth/time allowed per crawl factor.
As far as simulating the PR10 theory, there are variables which come into play. For example, I'm on a shared server and my pageload times are fine, but the app calls tend to be slower than if I'd spend the dough and get a dedicated server with gobs of ram and a lightning fast processor. I doubt it'd be possible for google to get every page from my site if it was to dedicate 10 IP's to me and have them hit me twice a minute all month long. It'd still only get 864,000 pages indexed. At that point, if I had 0 links from any other site, my index page would probably cap out at 8 or 9, regardless. I doubt even a PR10 site gets 10 bot threads running all month long. Though, as you can see, large sites would definitely get a nice kick if internal links counted.
There's also the factor that Google's crawl depth overall varies from month to month.
i don't understand though why you believe the "no real PR for dynamic pages" theory is worth considering if you don't agree with the other points. what sense would it make then to ignore dynamic pages?
I think Google uses "guessed" PR in its own calculations. All of my main subject pages have the same PR in the toolbar. Look at it this way. If you've got a site, dynamic or not, how many links do you have going to your "links" page? Probably none. Therefore, it has no page rank except for the links going to it from within your site. If you've got 15 pages on your site and every one of them links to your links page, then you might be lucky enough to have a "real" PR of 1 or 2. Yet, your links page is "guessed" at one less than your homepage per "/" in the URL and that "guessed" value is what is passed along to the sites you link to. If that isn't a fact, then every one of those PR calculation tools is wrong.
G.
I've got roughly 1 million pages on my site and this month it's PR2, so Google comes and grabs 10K pages from it. Now, I've got 10K more "link to's" next month, enough to move me to PR3. As PR4, google comes and gets 20K pages, so next month, I've got an additional 10K inbound links. Enough to move me to PR5. Etc. You see how that would go, right? Every site with lots of pages would eventually make it to PR9 or PR10 (with very high crawl depth) simply because they had lots of links to themselves.
And that's what makes PageRank concept great.
If you drop the spam filters and other real-world biases - a site with 1M pages is most likely "better" than a site with 10 pages.
At that time, only 7 pages looked static and the rest - dynamic.
....7 pages got indexed.
I modified the site to make all pages look static - got about 700 pages crawled so far (it's still going). Don't know how many will actually get into the index.
Some pages are 5 or 6 clicks away from the home page.
The home page has PR 5 as of now.
By making pages look static, I mean - creating a proxy app that converts something like X100M300.htm to page.jsp?par1=100&par2=300 and serves the request.
Just my $0.02.
And that's what makes PageRank concept great.
If you drop the spam filters and other real-world biases - a site with 1M pages is most likely "better" than a site with 10 pages.
Ahh, but that's the point. You don't get that because big sites aren't getting credit for their internal links to themselves. Yahoo seems to get one for each subdomain, so that helps. IMDb has credit for 3 or 4 pages. I get credit for one. (My default.asp links to my / - which is also my default.asp)
G.
don't get that because big sites aren't getting credit for their internal links to themselves. Yahoo seems to get one for each subdomain, so that helps. IMDb has credit for 3 or 4 pages. I get credit for one.
grumpus, (potentially stupid question) how do you measure what a page gets cerdit for and what not?
muesli
As a webmaster I almost never put permanent links to .asp and .php pages, especially when they contain ? and & characters in the URL. Experience has taught me that these usually turn into missing pages within a few months.
If you look at Yahoo, you will find that the majority of links that have been there for more than a year are to static web pages.
If Google is running machine learning algorithms on their databases to help them rank pages (they have been advertising for positions), then this is one bias that surely has been noticed and included in the ranking algorithm.
Most of the reliable, relevant, well placed links are to static pages. Thus these are the type of links that the Google ranking program assigns high PR to.
I am not sure how long this will continue. In my opinion dynamic web languages and databases enable the webmaster to produce timely and relevant content in greater quantities that would have been possible by hand. These technologies just need to find a solution to the problem of shifting/disappearing content.
[edited by: JohnKing1 at 1:26 pm (utc) on Aug. 3, 2002]
how do you measure what a page gets cerdit for and what not?
Check inbound links at Google. Obviously, some low ranked pages aren't going to show in that list, but it'll give you an idea. If it shows in that list, the page is getting credit.
Experience has taught me that these usually turn into missing pages within a few months.
That's an interesting point. This is quite true (usually) for news based sites, but I believe that if Google is making a generalization about this, then it's making a major mistake. Speaking for sites in my field (movies), I don't know of any that actually have content that vanishes over time. I'm sure this is almost certainly the case for most other "information" sites (as opposed to "breaking news" sites) as well.
I agree that it makes sense that Google algos would feel this way, though. Of the high PR'd sites (erm, sites whose pages tend to get high PR), a good many of them are news sites (presumably because people link to the news). A good many of these purge stories once they lose relevance.
But, I highly doubt many a webmaster says, "Drat, that URL has a ? in it so I'm not adding a link." They might likely say, "That's something that'll be there for a week and vanish" if that, in fact, is how the page looks.
Interesting theory, and there's a foothold there for us to explore, but I ain't quite buying it, yet.
G.
the user pageoneresults suggested that only links on PR4+ pages are being credited, not any from PR3 and below. (he deducted this from the fact that the search "link:www.domain.com" only yields PR4+ results.)
if this is true it could be an answer to our "internal linking discussion. you say:
I suspect that large sites don't get internal link benefits simply because it would greatly unbalance the whole scale.(..) Every site with lots of pages would eventually make it to PR9 or PR10.
muesli
ps. is the "only PR4+" theory new or common sense and i just didn't yet stumble accross?
The answer lies somewhere in between these two theories, most likely.
G.