Forum Moderators: Robert Charlton & goodroi
This is my conjecture, but what if Google is now working with domain-wide semantic factors a bit more strongly than in the past?
Some are seeing a strange trend toward SERPs that rank higher level pages that are actually one click away from the real "meat" - now what's that all about?
Also I've looked at enough troubled rankings recently to realize that some of the domains involved have developed into a kind of crazy-quilt of topics. As long as the scoring was very strong on just the individual url, these sites were doing great. But just maybe the disconnectedness of their "theme" is now being detected and seen as a negative.
I'm talking here about sites that throw up lots of varied pages to catch different kinds of keyword traffic, you know? They usually have "real" content, not scraped, but it's either MFA or (dare I coin another acronym?) MFO, made for organic. What Google says they want is MFV, made for the visitor.
Now obviously general news sites are also a crazy quilt of a kind, so it shouldn't just be any wide ranging diversity of topics that is problematic - that's not precise enough. But Google probably knows that their end user is often happier when the SERP sends them to a domain filled with relevant information, and not just a one-off page or even a small section.
Something about this feels like it's lurking in the back of my brain somewhere trying to break through. I am thinking more about domain-wide positive relevance signals here, rather than penalties.
Have my babblings triggered anyone's brain cells?
you may just have hit my issues on the head here tedster though I woulod argue that made for organic and made for visitor could easily be one and the same. After all if I nned a telephone number in ghana I do not have a local yellow p and therefore the internet is a perfect way to find it (and maybe the only way)
so by providing me with that number the web site I have found is very definitely made for the visitor
whilst wiki does buck this trend a bit I suspect that G would love to be rid of it - fairly concrete proof that there algos dont always work for them either
How might Google measure the site, and not just a page?
So how, and based on what factors, would they be ranking sites as a whole rather than by individual pages?
Years old, separating into the different facets:
Google Optimization Basics [webmasterworld.com]
The part about "Sitewide" and a ton more could be added to that now. Not "penalties" but "signals of quality" that contribute to rankings for a site. What would they be using for measuring?
{1)Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?
{2)Anyone think PageRank per page is still king - or could there be a quality and/or IBL and/or neighborhood score for a site as a whole?
[edited by: Marcia at 1:28 am (utc) on Mar. 27, 2007]
{1)Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?
Entirely and totally. I theorize they are examining the page, inbound links to that page and comparing that page in relation to the entire website including theme of the website (content, titles, internal linking, number of themed AND unique pages) as well as semantic analysis of other pages linked to and linked from the page in question. If the page in question fails the test, it acts like a broken pillar page and by page trust is broken.
I like the silo explanation, which supports this theory. Wikipedia does well because in a way it has many supporting pages that bolster each other. Of course many links to many of these and subpages gives high scores to longer phrases which in turn breadcrumb or directly support the hierarchy.
{2)Anyone think PageRank per page is still king - or could there be a quality and/or IBL and/or neighborhood score for a site as a whole?
Quality inbound links per page is King, regarldess of pagerank. Quality is the key in my experience.
There must be few if any sites with as many deep links as Wikipedia. If I was looking at site wide factors I would consider many thousands of links to many thousands of internal pages to be be a sure sign of trust. It would be very difficult (impossible?) to get anywhere near this level of deep links artificially.
If I wanted to trust a site as opposed to a page this would be the way to go.
I think this is part of it. I have one site in a certain industry that was developed around the same time as another that I monitor. My site has about 500 pages of original content written by me and a couple of employed freelancers. The other has about 30,000 pages of cut and paste stuff. He now sits at position number 998. I'm guessing he took a bad shortcut.
Anyone think they maybe be using semantic analysis (not LSI) to determine themes and keyword set/phrases for sites?
Well, whether semantic or not, I'm pretty sure sites are themed or clustered. I think that's reflected in the Adwords recommended sites thingie when you're setting up a site targeted campaign.
I like BeeDeeDubbleU's idea that many deep links can be a reflection of overall site quality. It might be something to look into using Y!'s site explorer or something similar.
And maybe add the anti- of that. Would many links to external resources be an indication of quality?
As for the BlockRank paper, isn't that just to speed up the overall PR calculation? I don't think that in itself would affect anything, but it might possibly give G a better snapshot of a site's overall quality. Maybe.
Now if it's block analysis, weighting links and other elements according to what section of a page in which they reside (nav, content, footer, etc.), I'd assume that would affect inidividual pages and not the site across the board.
I've come across a couple of other Stanford papers. Not sure if they've been mentioned before. Haven't had a chance to delve too deeply into them as yet. (All PDFs)
Adaptive Page Rank [stanford.edu]
Anti-Trust Rank [stanford.edu]
Combined Rank [i.stanford.edu]
average page views per visit per keyphrase
average time spent on-site per visit per keyphrase
after-visit query termination
Just thoughts based on data they have access to via toolbar & analytics, and algo behavior.
And that kind of makes sense . . . if you really are the best place for blue widgets wouldn't that phrase occur more often than once across the entire site? Certainly this is a brainstorming notion, but I'm kicking it around a little bit.
My own page titles are interesting and unique, designed to be on topic and to attract visitors browsing through SERPS. But upon considering recent rankings, I don't think the page titles (not page names) were supporting my targeted keywords as well as they could be. Instead they were creative and interesting but not focused for a word crunching machine looking at the total site. Of course, all of the other basics are important for overall balance of a site, but page titles, that one got a little away from me.
This would begin to explain why multi phrase targeted sites are experiencing a drop. Writing page titles like headlines to interest the reader and not to repeat the underlying targeted phrase.
the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title
Bingo! While having a beer a few months ago with a local WebmasterWorld member we both brought up the point that we've observed this over the past few years and had both been using it for quite some time to very good effect. It works very well.
Now, the WebmasterWorld member is going to beat me about the head and shoulders with a McSorley's beer stein for saying too much, but you might also consider extending this same principal from a different direction.
As long as the scoring was very strong on just the individual url, these sites were doing great. But just maybe the disconnectedness of their "theme" is now being detected and seen as a negative.
On the contrary, I can't see it in the SERPs.
"Trusted" sites still have a tendency to rank high, far beyond their themes.
For a long time, Google have a bias towards "rich & famous" regardless of relevancy.
E.g. Mr. Bush mentioning 'humanity' will very soon occupy the first page for the keyword, even before Gandhi or Mother Theresa, not to mention an ordinary good guy.
Although I am (a kind of) fan of Wikipedia, I still cannot support it being on the top with the snippet: "There is no page with the name 'keyword' here".
"Trusted" sites still have a tendency to rank high, far beyond their themes.
Note about authority scoring: Not referring in the sense of a site with information that's authoritative and valuable - a crawler has no way to know that - except by the link profile. Rather, classic authority definition (as in hubs and authorities per Jon Kleinberg) - having authority status by reason of high profile IBLs and many links from respected hubs.
George Bush is NOT an authority on showing mercy to the poor, the others named were. But George Bush has mention and an IBL profile from sites that carry a lot of "authority conferring" weight, way above what any humanitarian would - even though humanitarians are the authorities on mercy to the poor.
the phrase blue widgets should also be supported overall by pages within the site that include blue widgets in the page title
That appears to be what I'm seeing on my site, at least to a limited degree.
For instance, a category index page is now ranking higher than the category pages it feeds. That's happened before on my site, but it was usually a short lived situation. This time it seems to be sticking around longer, but I'm not ready to say this is the way it's going to stay.
Yes, there is page worthiness. If a page is about blue widgets, is on a site about widgets, the site itself having incoming links about widgets and the page having links in about blue widgets... good chance you will rank for widgets.
But if that same site is say, 1000 pages in size, of which 900 are exact duplicate content from manufacturer info, then the site will attract a "just because you've got links, does not mean you are all that... we can see you aren't" penalty from Google.
Google, will penalize the WHOLE site, not just the pages that relate to the duplicate content... that is... the duplicate content might be about red whastmacallits.... have nothing to do with blue widgets... but still the blue widgets page is caned.
Really gives a big bonus to the subdomainers, who stand to be able to pass pr through links, but not attract such sitewide penalties.
I'm glad you brought that up, because I agree that it's getting muddied and drifting off the topic by off-topic posts about spam and penalties which have nothing to do with this topic. I agree, that doing that can tend to make a thread very confusing.
>>it seems like the same argument as always, put differently.
No, actually there's no argument here, there's just a bit of a conflict going on between those who want keep talking about penalties again, which this thread is NOT about, and those who want to discuss what the thread IS about. There are hundreds of thousands of posts in many dozen threads about penalties - but thankfully, finally, this is not one of them.
this thread is not about penalties (which is nice for a change) it's about:
How might Google measure the site, and not just a page?
So how, and based on what factors, would they be ranking sites as a whole rather than by individual pages?
One thing we tend not to discuss here very often - and that I have heard Google reps mention over the past year or so - is that Google is devising more ways of looking at the entire domain, and not only assessing relevance url by url.This is my conjecture, but what if Google is now working with domain-wide semantic factors a bit more strongly than in the past?
Some are seeing a strange trend toward SERPs that rank higher level pages that are actually one click away from the real "meat" - now what's that all about?
What if they're getting more into second-order co-occurrence and word sense ambiguation, which may be more tamper-proof? Isn't that, to a very large degree, what those phrase-based patents (which is NOT new technology), particularly in the area of clustering, are about?
[edited by: Marcia at 4:29 am (utc) on Mar. 29, 2007]
A quote from this link, [stanford.edu...]
"...Recent approaches to personalized and topic-sensitive PageRank schemes... require computing many PageRank vectors, each biased towards certain types of pages..."
Does that mean PageRank varies based on the topic of the page? So that if a visitor is looking for topic A, your page would rank at one level and if a visitor is looking for topic B, your (same) page would rank at another level?
I'm not sure how or if this ties into the possibility of Google assessing what a site's theme is, other than the thought that it could be combined somehow with the anti-trust factors in the other links jimbeetle posted.
From this link: [stanford.edu...]
"...We could classify a page as a spam page if it has Anti-Trust Rank value more than a chosen threshold value. Alternatively, we could choose to merely return the top n pages based on Anti-Trust Rank which would be the n pages that are most likely to be spam, as per our algorithm."
When they say "we could choose to merely return the top n pages," do they mean that they will put those pages in the search results and hold back the remainder of the pages, insuring that the pages most likely to be spam will be calculated first and a minimum number of pages for that site will be included?
If a site is primarily about Topic A, as google comes across pages about Topic B, the Topic B page might be included but subsequent Topic B pages could be suppressed.
[webmasterworld.com...]
I've noticed that sites with a strong Information Architecture do not seem to suffer from the extreme and inscrutable ranking problems that affect many other sites. I'm now considering the importance of strongly themed silos in the linking structure, as opposed to a more mesh-like interlinking.
Not that some interlinking between silos is wrong - on occasion it's quite valuable for the user. But it should be quite minimal, IMO. Sometimes the SEO thinker is so in love with links that they go into excess and blur the natural semantic theming for various parts of the site.
I'm now considering the importance of strongly themed silos in the linking structure
Can you describe how this would look? I thought that was what I had but I still lost a couple of sections.
I'd been linking the homepage to the contents of each section. Each section had related information withing a larger topic.
Everything in each subsection linked back to the subsection's contents page. Also everything in each subsection linked to each other. No more than 15 pages per subsection.
All pages link to the root level homepage.
Is this mesh or Information Architecture style that I've been doing?
From what you say, annej, you don't fall into that category.