Forum Moderators: open

Message Too Old, No Replies

Pitfalls in Analyzing SERP Data

establishing cause is not an easy thing

         

tedster

11:56 am on Dec 2, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In a discussion on the "Keywords" forum, a point came up that really got me thinking. There is a widespread idea that Google lowers Page Rank the farther a page gets into a sites's directory structure. This idea comes into play because people notice that (in general) PR lowers as you drill down into a site.

However, just noticing that two factors tend to occur together does not mean that one causes the other — and in this case the conclusion is just plain wrong, IMO.

As I see it, what is happening on Google is that for most sites, the deep pages have fewer internal links pointing to them. This lack of internal linking is what drops the PR, not the position in the directory structure. There is nothing in the math for PR that examines directory structure. In fact, with intensive linking, a deep internal page can have great PR.

This kind of error in logic is also one the pitfalls of certain popular SEO software — the kind that spiders the daylights out of the search engines every month and then updates the software with new rules. The software users assume that the relationships between different factors (kw density, prominence, etc) are actually the real causes of good ranking. Well, maybe yes and maybe no. WPG, for example, is careful in their language not to claim this, but most users still make the assumption anyway.

Logically, "post hoc ergo propter hoc" (after this, therefore because of this) is a fallacy. "Along with this, therefore the cause of this" is also a fallacy.

When we're looking for clues about an algo, we need to remember that spidering data from SERPs gives us exactly that — clues. With a big enough sample it even gives us good, strong clues. But it doesn't give us rules.

click watcher

4:21 pm on Dec 2, 2001 (gmt 0)



However, just noticing that two factors tend to occur together does not mean that one causes the other — and in this case the conclusion is just plain wrong, IMO.

i'm convinced that this is not the case too, i feel that the relevance of incoming links to a page is of greater importance than its directory position, i have an experimental site right now which has only one incoming link - from a dmoz listed site to the homepage, this to ensure a listing in the google index - otherwise no incoming links, now as we know new and unindexed pages show the drop of one PR per directory level ... but with pages i know to have been indexed, eg they are findable using specific search terms on google, this is absolutely not the case, with careful structuring and linking, i have been able to have sub directory pages with a PR of +2 the index page.

some caveats are that, i'm talking low PRs here on this site (due to lack of incoming links) so increasing PR from 1 to 2 is considerably easier than 5 to 6... however if the principle applies than it must be part of the algo. also this is on a shared ip address, although to the best of my knowledge this is not affecting the pages of the experimental site, but i will concede this could be an influence.

i'm sure that directory structure has no relevance to PR - it is merely a common co-incidence as tedster has suggested, i'm more convinced that the "quality" of incoming links is the overwhelming deciding factor, but of course i'm only repeating what i've read written by brett and others here.

one of the biggest clues to this is the thread by the webmaster of the "glossaries" site - forgive me for not finding the thread and forgetting who wrote it, but thanks for the inspiration !!! it was a clear blueprint of how a PR can jump, even if there is hand tweaking by google which i don't believe, it must be only to touch up what the algo is trying to achieve anyway.

in all i'd say that it's linking source and structure but not directory structure that is the "imperator romana" of the google PR mystery.

bird

9:59 pm on Dec 2, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Evidence clearly supports the above statements.

I have a third level page (www.site.com/first/second/third.html) that comes up on top for several relevant two word phrases, usually pulling in the toplevel index page just behind it. For other word combinations, the situation is reverse. The page in question is itself the index of a very popluar subsection of my site (incidentally a glossary ;)), and has a PageRank similar to the top level.

ciml

6:35 pm on Dec 3, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yep, PageRank trickles down via links, not URLs (and trickles whether the IPs are the same or not).

"The Anatomy of a Large-Scale Hypertextual Web Search Engine" and "The PageRank Citation Ranking: Bringing Order to the Web" are much more reliable that the Google ToolBar. :)

The variables are the PageRank of the linking pages and the number of links on those pages. The 'rank source' appears to be static across each PageRank build.

Calum

tedster

6:53 pm on Dec 3, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> ...and the number of links on those pages.

I think this is an under-utilized factor in getting good PR for "deep" pages. If you can find a way to get links from pages that have very few other links, it really helps the PR of the deep page. I've been working on natural ways to have pages with only one or two links total, and the results so far are excellent.

Those Google papers that ciml mentioned give us good reason to feel we know the cause of good PR. Their algo for on-page factors is the mysterious part.

ciml

7:27 pm on Dec 3, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tedster
>If you can find a way to get links from pages that have very few other
> links, it really helps the PR of the deep page.

That's absolutely true, but it's important not to be mislead by the logarithmic nature of the ToolBar/GoogleDirectory PageRank indicators.

Assuming log10, a PR6 page with 200 links is as good as a PR5 page with 20 links or a PR4 page with 2 links.

I think it is close to log10, but note that Chris_R (who's expended much effort on google) thinks that the ToolBar's closer to log6 than log10. That would make the result different but the idea the same.

> Their algo for on-page factors is the mysterious part.

As well as 'theming'. We know that link text helps _lots_, (the #1 listings in Google for gm and bill clinton both bring up pages that don't mention the words) but how much effect does the TITLE, H1 and body text of a linking page have? Whatever the answer, it's likely to increase.

Calum

tedster

8:00 pm on Dec 3, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good point, Calum.

And that's one reason not to worry if a PR sticks at 5 no matter what you do. Inbound links are still boosting the PR. It just doesn't show in the single digit toolbar version, because the boost is going from 5.001 to 5.375 or something like that.

If your page climbs up the SERP, then you've got what you want, even if you don't know the exact numbers.

ciml

2:17 pm on Dec 4, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tedster:
> If your page climbs up the SERP, then you've got what you want, even if you don't know the exact numbers.

Excellent point. It's easy to forget the objective of our efforts.

I think I'll print that and put it over my desk.

Calum

ebgreen

12:43 pm on Dec 11, 2001 (gmt 0)



A long directory structure will not cause high rankings on pages several folders deep, but it will help the sites page rank in will increase the value of your top level doorway pages and themes. Just place pages that are not that important for ranking purposes about three levels deep. Link these pages with a text link at the very top and bottom of the page to your home page. Also, build a hallway page of all of your three levels deep pages an submit this to the search engines.

This has worked very well for me.