brainstorm: How might Google measure the site, and not just a page? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

brainstorm: How might Google measure the site, and not just a page?

«
1
2
3
»

tedster

8:02 am on Mar 25, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

One thing we tend not to discuss here very often - and that I have heard Google reps mention over the past year or so - is that Google is devising more ways of looking at the entire domain, and not only assessing relevance url by url.

This is my conjecture, but what if Google is now working with domain-wide semantic factors a bit more strongly than in the past?

Some are seeing a strange trend toward SERPs that rank higher level pages that are actually one click away from the real "meat" - now what's that all about?

Also I've looked at enough troubled rankings recently to realize that some of the domains involved have developed into a kind of crazy-quilt of topics. As long as the scoring was very strong on just the individual url, these sites were doing great. But just maybe the disconnectedness of their "theme" is now being detected and seen as a negative.

I'm talking here about sites that throw up lots of varied pages to catch different kinds of keyword traffic, you know? They usually have "real" content, not scraped, but it's either MFA or (dare I coin another acronym?) MFO, made for organic. What Google says they want is MFV, made for the visitor.

Now obviously general news sites are also a crazy quilt of a kind, so it shouldn't just be any wide ranging diversity of topics that is problematic - that's not precise enough. But Google probably knows that their end user is often happier when the SERP sends them to a domain filled with relevant information, and not just a one-off page or even a small section.

Something about this feels like it's lurking in the back of my brain somewhere trying to break through. I am thinking more about domain-wide positive relevance signals here, rather than penalties.

Have my babblings triggered anyone's brain cells?

jk3210

8:02 pm on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

<<You see a lot of these penalty sites -950 that have a lot of cut and pasted content on them>>

The sites in my sector that have been -950'd have zero dup content.

chelseaareback

8:55 pm on Mar 26, 2007 (gmt 0)

10+ Year Member

"I'm talking here about sites that throw up lots of varied pages to catch different kinds of keyword traffic, you know? They usually have "real" content, not scraped, but it's either MFA or (dare I coin another acronym?) MFO, made for organic. What Google says they want is MFV, made for the visitor."

you may just have hit my issues on the head here tedster though I woulod argue that made for organic and made for visitor could easily be one and the same. After all if I nned a telephone number in ghana I do not have a local yellow p and therefore the internet is a perfect way to find it (and maybe the only way)

so by providing me with that number the web site I have found is very definitely made for the visitor

whilst wiki does buck this trend a bit I suspect that G would love to be rid of it - fairly concrete proof that there algos dont always work for them either

trinorthlighting

9:24 pm on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I said a lot of -950 sites, not all of them.

There could be other reason why your -950 such as affiliate links, etc...

jk3210

10:19 pm on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Yes, I know. That's why I said "sites in my sector."

Marcia

1:21 am on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

They don't penalize for affiliate links - it may be for how a site is done, but not the presence of affiliate links alone, as such. But that's a whole nuther topic this thread is not about penalties (which is nice for a change) it's about:

How might Google measure the site, and not just a page?

So how, and based on what factors, would they be ranking sites as a whole rather than by individual pages?

Years old, separating into the different facets:

Google Optimization Basics [webmasterworld.com]

The part about "Sitewide" and a ton more could be added to that now. Not "penalties" but "signals of quality" that contribute to rankings for a site. What would they be using for measuring?

{1)Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?

{2)Anyone think PageRank per page is still king - or could there be a quality and/or IBL and/or neighborhood score for a site as a whole?

[edited by: Marcia at 1:28 am (utc) on Mar. 27, 2007]

Keniki

1:31 am on Mar 27, 2007 (gmt 0)

I think this is an interesting subject. I personally am working off the assumption google would measure the quality of the site by the quality of its internal pages.

CainIV

6:23 am on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

{1)Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?

Entirely and totally. I theorize they are examining the page, inbound links to that page and comparing that page in relation to the entire website including theme of the website (content, titles, internal linking, number of themed AND unique pages) as well as semantic analysis of other pages linked to and linked from the page in question. If the page in question fails the test, it acts like a broken pillar page and by page trust is broken.

I like the silo explanation, which supports this theory. Wikipedia does well because in a way it has many supporting pages that bolster each other. Of course many links to many of these and subpages gives high scores to longer phrases which in turn breadcrumb or directly support the hierarchy.

{2)Anyone think PageRank per page is still king - or could there be a quality and/or IBL and/or neighborhood score for a site as a whole?

Quality inbound links per page is King, regarldess of pagerank. Quality is the key in my experience.

BeeDeeDubbleU

8:08 am on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Deep linking.

There must be few if any sites with as many deep links as Wikipedia. If I was looking at site wide factors I would consider many thousands of links to many thousands of internal pages to be be a sure sign of trust. It would be very difficult (impossible?) to get anywhere near this level of deep links artificially.

If I wanted to trust a site as opposed to a page this would be the way to go.

lfgoal

4:20 pm on Mar 27, 2007 (gmt 0)

10+ Year Member

Top Contributors Of The Month

"I think your on the right track. You see a lot of these penalty sites -950 that have a lot of cut and pasted content on them."

I think this is part of it. I have one site in a certain industry that was developed around the same time as another that I monitor. My site has about 500 pages of original content written by me and a couple of employed freelancers. The other has about 30,000 pages of cut and paste stuff. He now sits at position number 998. I'm guessing he took a bad shortcut.

jimbeetle

5:24 pm on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?

Well, whether semantic or not, I'm pretty sure sites are themed or clustered. I think that's reflected in the Adwords recommended sites thingie when you're setting up a site targeted campaign.

I like BeeDeeDubbleU's idea that many deep links can be a reflection of overall site quality. It might be something to look into using Y!'s site explorer or something similar.

And maybe add the anti- of that. Would many links to external resources be an indication of quality?

As for the BlockRank paper, isn't that just to speed up the overall PR calculation? I don't think that in itself would affect anything, but it might possibly give G a better snapshot of a site's overall quality. Maybe.

Now if it's block analysis, weighting links and other elements according to what section of a page in which they reside (nav, content, footer, etc.), I'd assume that would affect inidividual pages and not the site across the board.

I've come across a couple of other Stanford papers. Not sure if they've been mentioned before. Haven't had a chance to delve too deeply into them as yet. (All PDFs)

Adaptive Page Rank [stanford.edu]
Anti-Trust Rank [stanford.edu]
Combined Rank [i.stanford.edu]

Bentler

7:36 pm on Mar 27, 2007 (gmt 0)

10+ Year Member

I suspect Google looks at the following site metrics to rank results for relevance/quality:

average page views per visit per keyphrase
average time spent on-site per visit per keyphrase
after-visit query termination

Just thoughts based on data they have access to via toolbar & analytics, and algo behavior.

pshea

9:11 pm on Mar 27, 2007 (gmt 0)

10+ Year Member

I think they turned up the page title importance while the page name knob is off off off. If you are targeting blue widgets, then not only should your main landing page for blue widgets be pitch perfect and well balanced but the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title.

And that kind of makes sense . . . if you really are the best place for blue widgets wouldn't that phrase occur more often than once across the entire site? Certainly this is a brainstorming notion, but I'm kicking it around a little bit.

My own page titles are interesting and unique, designed to be on topic and to attract visitors browsing through SERPS. But upon considering recent rankings, I don't think the page titles (not page names) were supporting my targeted keywords as well as they could be. Instead they were creative and interesting but not focused for a word crunching machine looking at the total site. Of course, all of the other basics are important for overall balance of a site, but page titles, that one got a little away from me.

This would begin to explain why multi phrase targeted sites are experiencing a drop. Writing page titles like headlines to interest the reader and not to repeat the underlying targeted phrase.

jimbeetle

9:43 pm on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title

Bingo! While having a beer a few months ago with a local WebmasterWorld member we both brought up the point that we've observed this over the past few years and had both been using it for quite some time to very good effect. It works very well.

Now, the WebmasterWorld member is going to beat me about the head and shoulders with a McSorley's beer stein for saying too much, but you might also consider extending this same principal from a different direction.

activeco

11:46 pm on Mar 27, 2007 (gmt 0)

10+ Year Member

As long as the scoring was very strong on just the individual url, these sites were doing great. But just maybe the disconnectedness of their "theme" is now being detected and seen as a negative.

On the contrary, I can't see it in the SERPs.
"Trusted" sites still have a tendency to rank high, far beyond their themes.
For a long time, Google have a bias towards "rich & famous" regardless of relevancy.
E.g. Mr. Bush mentioning 'humanity' will very soon occupy the first page for the keyword, even before Gandhi or Mother Theresa, not to mention an ordinary good guy.

Although I am (a kind of) fan of Wikipedia, I still cannot support it being on the top with the snippet: "There is no page with the name 'keyword' here".

SEOPTI

11:57 pm on Mar 27, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title"

This smells like forcing OOP.

Marcia

12:31 am on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"Trusted" sites still have a tendency to rank high, far beyond their themes.

No doubt about it, but as long as we're looking at a basically link_based engine, it stands to reason that the "authority score" of a site will hold a strong trump card for rankings.

Note about authority scoring: Not referring in the sense of a site with information that's authoritative and valuable - a crawler has no way to know that - except by the link profile. Rather, classic authority definition (as in hubs and authorities per Jon Kleinberg) - having authority status by reason of high profile IBLs and many links from respected hubs.

George Bush is NOT an authority on showing mercy to the poor, the others named were. But George Bush has mention and an IBL profile from sites that carry a lot of "authority conferring" weight, way above what any humanitarian would - even though humanitarians are the authorities on mercy to the poor.

howiejs

7:54 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I would appreciate a definition and example of:
Bentler - "after-visit query termination"

I am also looking at outbound links and the anchor of these links to other on-theme authority sites.

CainIV

8:20 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title

Agreed, and see it working as well.

SEOPTI

8:35 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title"

Nonsense, you risk OOP.

ken_b

9:33 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title

That appears to be what I'm seeing on my site, at least to a limited degree.

For instance, a category index page is now ranking higher than the category pages it feeds. That's happened before on my site, but it was usually a short lived situation. This time it seems to be sticking around longer, but I'm not ready to say this is the way it's going to stay.

nippi

11:14 pm on Mar 28, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm not entirely clear on this thread... it seems like the same argument as always, put differently.

Yes, there is page worthiness. If a page is about blue widgets, is on a site about widgets, the site itself having incoming links about widgets and the page having links in about blue widgets... good chance you will rank for widgets.

But if that same site is say, 1000 pages in size, of which 900 are exact duplicate content from manufacturer info, then the site will attract a "just because you've got links, does not mean you are all that... we can see you aren't" penalty from Google.

Google, will penalize the WHOLE site, not just the pages that relate to the duplicate content... that is... the duplicate content might be about red whastmacallits.... have nothing to do with blue widgets... but still the blue widgets page is caned.

Really gives a big bonus to the subdomainers, who stand to be able to pass pr through links, but not attract such sitewide penalties.

Marcia

4:11 am on Mar 29, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>>I'm not entirely clear on this thread...

I'm glad you brought that up, because I agree that it's getting muddied and drifting off the topic by off-topic posts about spam and penalties which have nothing to do with this topic. I agree, that doing that can tend to make a thread very confusing.

>>it seems like the same argument as always, put differently.

No, actually there's no argument here, there's just a bit of a conflict going on between those who want keep talking about penalties again, which this thread is NOT about, and those who want to discuss what the thread IS about. There are hundreds of thousands of posts in many dozen threads about penalties - but thankfully, finally, this is not one of them.

this thread is not about penalties (which is nice for a change) it's about:
How might Google measure the site, and not just a page?

So how, and based on what factors, would they be ranking sites as a whole rather than by individual pages?

That's what some folks are trying to discuss - analysis, the topic of the thread: ranking factors, algo characteristics - Information Retrieval, and how it might work.

Marcia

4:23 am on Mar 29, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

tedster

One thing we tend not to discuss here very often - and that I have heard Google reps mention over the past year or so - is that Google is devising more ways of looking at the entire domain, and not only assessing relevance url by url.
This is my conjecture, but what if Google is now working with domain-wide semantic factors a bit more strongly than in the past?

IMHO hub scores and authority scores never stopped being a factor, but yes - what if they're working more with domain-wide semantic analysis, IDF, term vectors, phrase vectors, and phrase-based co-occurrence, not only on a data-wide index level, but on a more granular, "host-based" level.

Some are seeing a strange trend toward SERPs that rank higher level pages that are actually one click away from the real "meat" - now what's that all about?

OK, it may not be LSI, but couldn't there be a latency factor involved, based on co-occurrence?

What if they're getting more into second-order co-occurrence and word sense ambiguation, which may be more tamper-proof? Isn't that, to a very large degree, what those phrase-based patents (which is NOT new technology), particularly in the area of clustering, are about?

[edited by: Marcia at 4:29 am (utc) on Mar. 29, 2007]

piney

6:28 am on Mar 29, 2007 (gmt 0)

10+ Year Member

Jimbeetle posted a few links.

A quote from this link, [stanford.edu...]

"...Recent approaches to personalized and topic-sensitive PageRank schemes... require computing many PageRank vectors, each biased towards certain types of pages..."

Does that mean PageRank varies based on the topic of the page? So that if a visitor is looking for topic A, your page would rank at one level and if a visitor is looking for topic B, your (same) page would rank at another level?

I'm not sure how or if this ties into the possibility of Google assessing what a site's theme is, other than the thought that it could be combined somehow with the anti-trust factors in the other links jimbeetle posted.

From this link: [stanford.edu...]

"...We could classify a page as a spam page if it has Anti-Trust Rank value more than a chosen threshold value. Alternatively, we could choose to merely return the top n pages based on Anti-Trust Rank which would be the n pages that are most likely to be spam, as per our algorithm."

When they say "we could choose to merely return the top n pages," do they mean that they will put those pages in the search results and hold back the remainder of the pages, insuring that the pages most likely to be spam will be calculated first and a minimum number of pages for that site will be included?

If a site is primarily about Topic A, as google comes across pages about Topic B, the Topic B page might be included but subsequent Topic B pages could be suppressed.

Brett_Tabke

3:58 pm on Mar 29, 2007 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Best Post Of The Month

I can't shake the feeling of deja vu:

[webmasterworld.com...]

tedster

5:55 pm on Mar 29, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

One of the factors this topic brings up for me is internal linking structure - and information Architecture altogether. If you look at the way Google now algorithmically determines SiteLinks for some #1 positions, you get the feel for how they are able to look at a domain as an overall structure (or perhaps not, in some cases.)

I've noticed that sites with a strong Information Architecture do not seem to suffer from the extreme and inscrutable ranking problems that affect many other sites. I'm now considering the importance of strongly themed silos in the linking structure, as opposed to a more mesh-like interlinking.

Not that some interlinking between silos is wrong - on occasion it's quite valuable for the user. But it should be quite minimal, IMO. Sometimes the SEO thinker is so in love with links that they go into excess and blur the natural semantic theming for various parts of the site.

annej

9:37 pm on Mar 29, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'm now considering the importance of strongly themed silos in the linking structure

Can you describe how this would look? I thought that was what I had but I still lost a couple of sections.

I'd been linking the homepage to the contents of each section. Each section had related information withing a larger topic.

Everything in each subsection linked back to the subsection's contents page. Also everything in each subsection linked to each other. No more than 15 pages per subsection.

All pages link to the root level homepage.

Is this mesh or Information Architecture style that I've been doing?

tedster

12:07 am on Mar 30, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

That sounds like good theme silos to me, too. Does your site get SiteLinks for any search? If so, does Google's choice of key internal pages make sense to you?

annej

3:11 pm on Mar 31, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

No, I don't have a site at that level. It would be interesting to see what Google would feature though. A great way to see if your internal linking does what you want it to.

tedster

4:33 pm on Mar 31, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

One issue might be too much cross-linking from deep pages in one themed silo to deep pages in a differently themed silo. If the IA is strong, then this kind of linking is rarely needed for visitors. But I do see excessively cross-linked themes in some sites, particularly in footer links, and sometimes overdone in the content block, too. My feeling is that this is done just for the search engines.

From what you say, annej, you don't fall into that category.

This 68 message thread spans 3 pages: 68

«
1
2
3
»