Does the size of your site matter?

Forum Moderators: open

Message Too Old, No Replies

Does the size of your site matter?

aggie12thman

4:43 am on Apr 3, 2003 (gmt 0)

We all know relavent backward links and solid content are key to PR. What about the number of pages within your site? Does size matter?

deejay

12:48 am on Apr 4, 2003 (gmt 0)

re 'random surfer'

Thanks Chris_R.. You explained it well. :) yep, on that basis it makes sense.

GrinninGordon

12:50 am on Apr 4, 2003 (gmt 0)

freejung

"Given that the Google PR algo simply counts the amount of time a random surfer stays on your page"

Are you saying click pop is a factor on Google? I did not think it was or could be as there is no string in the SERP to suggest this!

But is anyone knows differently, I would love to hear about it :-)

swerve

2:00 pm on Apr 4, 2003 (gmt 0)

Keep in mind you get your PR from EXTERNAL sources

The EVIDENCE is the EQUATION that has been on the web since google has been out.

Chris_R,

I'm still a newbie here, but much of what I have read here runs contrary to your statement above. Can you point to the section of the Google/PR paper where it states that links from internal pages are excluded from the PR calculation?

Thanks,

swerve

killroy

2:42 pm on Apr 4, 2003 (gmt 0)

Actually the paper states quite clearly that each page has an inherent PR 1 of OUTGOING votes, no differentiation of internal or extarnal links. In fact the basic simplyfied equation does not make and DISTINCTION between domains, it treats every page euqual.

So a site of 100 pages has a possible PR of 100. If they are unlinked the dampening will reduce that. if one page is linked to the remaining 99 pages and all those back that the site will have a total PR of 100 skewed towards the index page. If all pages link to all others each page will have exactly PR 1.

Sites and domains only come in for cross-link penalisation and topic/context grouping.

At least that is my opinion, which may be wildly inaccurate since the algo details aren't published.

<added>I wish this was a wiki, then I wouldn't ahve to fix my own crappy spelling all the time</added>

[edited by: killroy at 3:48 pm (utc) on April 4, 2003]

dwilson

2:49 pm on Apr 4, 2003 (gmt 0)

Right on, Killroy. Glad I read the entire thread before I jumped in with that.

takagi

3:48 pm on Apr 4, 2003 (gmt 0)

It seems there is a maximum of PR transfer from many identical links on the same site. If you look at the bottom of this page you see a link to BestBBS. Searching

site:www.webmasterworld.com bestbbs

with Google will show you 44,100 pages. Searching

link:www.bestbbs.com

gives 8,100 pages. That is 8,100 pages with at least a PR4 (up to PR7). But still the PR of this site (which consist of only 1 page) is PR4!

killroy

3:52 pm on Apr 4, 2003 (gmt 0)

Speaking of which, how can I get google count my internal links back to my homepage? It currently only counts 240 links, all external. I have over 60000 pages, all linking back to my homepage, many with PR 5 and even more with PR 4. why don't they show up?

Chris_R

3:55 pm on Apr 4, 2003 (gmt 0)

Chris_R,
I'm still a newbie here, but much of what I have read here runs contrary to your statement above. Can you point to the section of the Google/PR paper where it states that links from internal pages are excluded from the PR calculation?
Thanks,
swerve

Links from internal pages are not excluded from PR calculations - sorry if I gave that impression.

However, the pages must have some PR to begin with - you can't put up a site and link it all together to each other and hope it will create PR.

Kilroy is right in that each page has a small PR, but this is not what people think it is - this is raw PR - before the normalization.

You NEED PR from some external source for it to get into your site. This is not up for debate. Gogle has said as much - it is obvious from reading their pages - their webmaster section and - statements they have made at conferences.

The amount of PR that is created by your own pages is in direct correlation to the number of pages on the web.

If you have 1 page - you have 1/3,000,000,000 of the webs PR. This is a RAW PR of one which is practically 0 on the toolbar scale AFTER the pages are normalized. If you have no links to any of your pages - you aren't even put into the equation. There is only one web and one PR equation - if you don't have external links to your website - you are not part of the web - you are an intranet for all intents and purposes.

People will believe what they want - but notice that I wasn't taken up on my $100 offer.

I hope my statement about it having to be a site and not a subpage didn't confuse - that was merely to prevent confusions as to the pretent PR that appears on some sub pages.

It is correct for all intents and purposes for the PR calculation - a page is not different if it isn't the main page. ALL PAGES COUNT - as long as they have incoming links from other websites (that have incoming links from somewhere - that has incoming links - and so on).

rogerd

4:02 pm on Apr 4, 2003 (gmt 0)

Does size matter?

Of course it does, a million spammers can't all be wrong, can they?

My own experience is that size improves traffic, but doesn't necessarily boost PR. Of course, indirect PR effects could include greater linkage to the site as a whole, more deep links to content, etc.

takagi

4:09 pm on Apr 4, 2003 (gmt 0)

> why don't they show up?

Killroy, there are a few possible reasons why they don't show up in your case. The page with the internal link to the homepage ...

1. has a low PR4 or lower.

2. is bigger 101 KB and the link is at the end

3. didn't have the link when the page was indexed

4. shows a guessed PR but the page was never indexed

5. has a link in such a way that G doesn't find it(e.g. using JavaScript, frames)

I will look at it, if you stickymail me the URL.

freejung

5:17 pm on Apr 4, 2003 (gmt 0)

OK, deejay, GrinninGordan and Chris_R,

What I'm referring to is the article linked to by French Tourist in this thread:

[webmasterworld.com...]

Google has probably extended and modified this algo somewhat, but this paper describes the way the original Google PR algo works, which is according to a simple iterative equation which can also be expressed as an eigenvalue problem.

Another way of expressing it, one which makes more intuitive sense, is the random surfer model. It goes like this:

Suppose you start a random surfer somewhere at random on the web. The surfer clicks on links totally at random, and continues this for an infinite number of cycles. What percentage of the time (meaning number of cycles, not actual clock time) does this surfer spend on your page? That is your PR.

In order to keep the surfer from getting stuck on pages or sites with no outbound links, they introduce a random factor E. Every now and then the surfer gets bored and jumps to some random page somewhere else on the web. This keeps it from getting stuck in a loop.

This has nothing to do with actual traffic, it's only a model.

In this model, a larger number of pages does increase your PR in two ways. One is that your pages are tightly connected to each other, so that the random surfer, once it hits your homepage, might go to a subpage and then back to your homepage, increasing the frequency of hits to the homepage. Of course, this would be accomplished just as well by having only one subpage, and having only one link on the subpage which points back to the homepage. The other way is that if you have more pages, there is a higher probability of the random surfer jumping to one of your pages when it jumps. Both of these effects are probably pretty small, but they do count for something.

Yes, most of your PR will come from external sources, because most of your visits from the random surfer will come from it clicking on a link to you from another site. But some might also come from the random surfer randomly landing on a subpage, and then clicking through to your homepage. This corresponds to a PR source, and is used to counteract the PR sink effect caused by sites that loop back on themselves.

With the combination of this effect and the broader keyword range that you have with more pages, I think it is a good idea to have more pages. This seems to be what Brett thinks too.

swerve

5:35 pm on Apr 4, 2003 (gmt 0)

You NEED PR from some external source for it to get into your site.

Thanks for the information and clarification. Before your reply, I went back to the PR paper and came to the same conclusion.

Based on this fact, in both scenarios I described, the pages would have zero PR, since there are no external links. So let's change each scenario such that there is one external link (the same link in both scenarios) to the home page. Would the PR of the home page be the same in the 12 page and the 52 page scenarios?

The PR paper discussed this very scenario, though I don't fully understand it. Perhaps someone can help me understand better. Here are some quotes:

There is a small problem with this simplified ranking function. Consider two web pages that link to each other but to no other page. And suppose there is some web page which points to one of them. Then, during iteration, this loop will accumulate rank but never distribute any rank (since there are no outedges). The loop forms a sort of trap which we call a rank sink. To overcome this problem of rank sinks, we introduce a rank source:

They then list an equation, which I don't understand. In the description, they state:

Note that if E is all postive, c must be reduced to balance the equation. Therefore, this technique corresponds to a decay factor...

Can someone help me understand this? Does this help answer my question of the 2 scenarios?

Thanks,

swerve

killroy

5:55 pm on Apr 4, 2003 (gmt 0)

I must disagree to the external link requirement. There is nothing in th eequation that requires you to e connected somehow to all pages of the net (if you were not you would be what you called a "intranet").

In fact the equation applies simply to a page and its incoming links. If you set up a domain with two pages, each linking to the other, you have a perfectly walid page with PR 1 on each subpage and PR 2 in total. If you submit it to google and it spiders your page it's in. converse, if you have a HUGE page (like mine) with a high PR due to its own linking, and all sites linking to you remove the links, your site wonT' magically disappear. In fact google already knows about your site, it'll come back to check if they still exist and spider any new internal pages. Of course you don't ahve any PR boost from thee outside, but you can still have a high PR website.

Regarding my own page and backlinks to my howapage. All internal pages are small (<10k), have a plain text link back to the root, have PR from 1 to 5 (the PR 5 at LEAST should show), have been hit many times by google, have been there for over 3 years, and show up in search results.

I suspect some sort of penalisation, since I had a GREAT URL SOUP that might have been seen as duplication. I hope I've fixed that now.

annej

6:10 pm on Apr 4, 2003 (gmt 0)

freejung, Thanks for the random surfer explanation. I think I finally have a clue about how it all works.

freejung

6:22 pm on Apr 4, 2003 (gmt 0)

Swerve, that's exactly what I'm trying to explain. The equation is actually pretty simple. It just says that the PR of a page is the sum of the PR passed to it by all pages which link to it, plus a factor determined by E, which corresponds to a PR source.

A loop like that described in the paper does accumulate PR, which is why it causes a problem. This is why the random jump factor has to be introduced, otherwise the random surfer will just go around and around forever. But even with the random factor, the loop will accumulate some PR, which is why spammers use it.

I strongly suspect that Google is using some more complex algo now, plus there are many other factors besides PR. So this doesn't tell the whole story. There may be other benefits of having a larger number of pages.

It does make sense that in a natural structure, (one not messed up by spammers, in other words), a larger site is likely to be more important, since it can be expected to have more information on it. Thus Google may have other ways to give a bonus to larger sites, I don't know.

But, according to the original PR paper, a larger site will give you a larger PR because of the random jump factor E, and any loop will give you higher PR due to the PR sink effect. However, this bonus is small compared to the PR you will get from a link on a high PR page with few other links.

The greatest benefit from having lots of content is that people are more likely to: a) link to you and b) stay on your site longer. This is pretty intuitive. Having the content spread across a large number of pages gives you the small PR benefit plus the greater benefit of increased keyword range.

That's all I'm saying.

swerve

6:52 pm on Apr 4, 2003 (gmt 0)

But, according to the original PR paper, a larger site will give you a larger PR because of the random jump factor E, and any loop will give you higher PR due to the PR sink effect. However, this bonus is small compared to the PR you will get from a link on a high PR page with few other links.

Thanks, freejung. Your posts have been very helpful, I understand this a lot better now.

Another post that I found helpful, from another thread, is this one:

[webmasterworld.com ]

GrinninGordon

12:00 am on Apr 5, 2003 (gmt 0)

freejung

OK, thanks. So to summarize it down to a single sentance. PR is not surfer click popularity, but virtual (algo) click popularity. Is that about right?

huppy99

7:58 am on Apr 5, 2003 (gmt 0)

Size matters. As long as the content on each page is unique. The more pages we added (we provide a automotive classifieds and listing service) meant more traffic because fo the variety and frequency of keywords on our pages.

We just ensured that all of our classifieds could be spidered. Many sites with a datavase of searchable information do not provide a way for robots to enter and spider the lot.
Our indexed page count more than doubled, up to 65,000 now or something. PR for main page actually dropped '1' point last update, but our total site traffic from google rose.

Eveb on our quietest day of the week, our Google traffic has increased 20-30% in the last 2 months, and we receive about 6000 visits a day.

killroy

12:03 pm on Apr 5, 2003 (gmt 0)

Right on.

I've been creating spiderable large databases for years, and found that using logs and amazon-style realtionalitya is a GREAT boon.

I ahve a site in which the actaully index is completely inaxxessible, as it uses JS compression to cram 700 formatted links into a few k size for fast naviagation. google simply follows the "related categories" and "other visitors who found this info usefull also visited..." style links. Through these only, 3-5 related category in each entry page lets google spider over 3000 categories and 10000 entrypages without problem.

Its about links, link relevancy, and ultimately usability... because in a way that is what google strives for.

freejung

9:05 pm on Apr 5, 2003 (gmt 0)

>PR is not surfer click popularity, but virtual (algo) click popularity. Is that about right?

Yup, that's how I understand it. The algo, unlike a real surfer, clicks on links totally at random.

This 50 message thread spans 2 pages: 50