homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Reading Level and Panda
BenFox




msg:4300399
 2:12 pm on Apr 19, 2011 (gmt 0)

I was having a poke around at the sites on the Sistrix winners and losers list for Panda#2 when I stumbled across this.

It seems (and this was only for a very limited number of sites) that sites which got Panda'd are more likely to have more content which is of an advanced reading level.

To see what I mean pick out a site that got hit, turn on the reading level filter in advanced search options to "annotate results with reading levels" and type "site:url.com" into the search box.

I did this for 30 losers and 15 winners from the Sistrix data as well as 15 winners and 15 losers from Greenlight data. In both cases the losers had a higher average % of advanced pages.

For Sistrix data 7.7% of pages from losers were of an advanced reading level compared to 3.4% for winners.

Greenlight was about the same.


I'm just putting this out there in case anyone wants to look into it in more detail. There are several possible reasons not to draw too many conclusions from my findings -

1- it's a tiny sample.
2- there might be unknown quantities in the reading age algo
3- it could just be that content farms are more likely to have copied/spun/rewritten advanced content than other sites.
4- it could be the lack of editorial control on content farms means they don't have a house style so are more likely to end up with advanced content.

Basically don't go causing a panic by assuming that this is correct. It's probably just a blip.

 

goodroi




msg:4300620
 7:28 pm on Apr 19, 2011 (gmt 0)

My research over the last few weeks also found a loose correlation between reading level and Panda. I don't think it is a direct cause and effect but it just might help uncover the key parts behind Panda. It is definitely something worthy of more discussion and research.

netmeg




msg:4300643
 7:50 pm on Apr 19, 2011 (gmt 0)

You'd think there'd be some reason why they made it a search tool option - I mean, why bother if they weren't going to use it for something?

Shatner




msg:4300660
 7:58 pm on Apr 19, 2011 (gmt 0)

I'd actually been wondering the same, based purely on the eyetest of looking at these "don't fit the mold" sites over the past couple of weeks. I didn't know there was a tool to actually test it.

But I find it hard to believe that anything on eHow or with Demand Media is written at an advanced level.

I do think a lot of the sites in the "don't fit the mold" thread we started are written at a far more advanced level than their competitors though.

crobb305




msg:4300662
 8:00 pm on Apr 19, 2011 (gmt 0)

For Sistrix data 7.7% of pages from losers were of an advanced reading level compared to 3.4% for winners.


I used a similar tool that scored complexity (reading level) from 1 to 20. My Pandalized site scored higher (more complex) than most of those that were well-ranking. The free-version I used just looked at about 10 pages.

It's interesting you mention this, again another of many possible factors, but it is something that I noticed a few days ago when I was doing my analyses. It's hard for me to write on a different level. I'm not sure what I can do to improve, but it's something I am looking into.

Leosghost




msg:4300685
 8:23 pm on Apr 19, 2011 (gmt 0)

Levels are "relative" ( yes I know that at first glance that would appear to be obvious, but think about it, they vary with culture and geography )..and also reading levels, and particularly comprehension levels, have been falling in the "West" for years.

Grammatical structure appears to be "optional", with bad grammar and spelling accepted as the "norm",( their, there and they're and also you are, your and you're, appear to be taken as interchangeable on these fora as elsewhere ) ..much to the irritation and chagrin of some of us.

But..in the light of such degradation..

It would not be surprising if Google testers "disliked" what some of them could not understand, due to lack of their own vocabulary ..and if those "dislikes" became folded into Panda in part.

I don't approve in anyway of such "dumbing down"..it bodes badly for our societies ..but if "reality" TV can flourish, and semi literates can run for highest office in western developed countries, and actually attract mass following ! Well, maybe the search engines are reflecting that Rome is being over run ..from the inside, by the self created barbarians :((..

As netmeg says it would be strange if it ( reading level ) wasn't there for a reason ..but IMO it is a very blunt tool..and one which should not be used for general assessment of sites when basic serps are created for presentation ..it should be an end user only modifier.

BenFox




msg:4301741
 7:43 am on Apr 21, 2011 (gmt 0)

@shatner
But I find it hard to believe that anything on eHow or with Demand Media is written at an advanced level.


There's a difference between being well written and requiring an advanced reading level. If you do the reading level site: search on a major newspaper you'll probably find that most of the content has an intermediate reading level according to Google.


@Leosghost
It would not be surprising if Google testers "disliked" what some of them could not understand, due to lack of their own vocabulary ..and if those "dislikes" became folded into Panda in part.


I think you've hit on a far larger trend than just Panda. If you study a creative writing course in the UK (and probably in the US as well) one of the first thing's you're taught is that good writing often comes down to writing on complex subject matters in a way that idiot's can understand. That's why so many of the most popular books have an "accessible" reading level(think Da Vinci Code).

brinked




msg:4301750
 7:56 am on Apr 21, 2011 (gmt 0)

this claim is totally bogus. If it was true, half my sites written by non native english writers would have been pandalized as well. Just because an article was not written by someone with a PHD means its not useful?

There are tons of USEFUL content that is not textual. Some people are just not advanced, its sometimes actually better to write on a basic level that everyone can understand. There are too many poorly written sites still ranking to even think google uses this as a major factor.

Google wants us to believe they are way more advanced than they really are, but the reality is this panda update is not about 1 thing and the writers ability to write is not a major factor if it is one at all.

BenFox




msg:4301758
 8:27 am on Apr 21, 2011 (gmt 0)

Just because an article was not written by someone with a PHD means its not useful?


That's not what I said. If anything I said that an increased number of "advanced level" pages could lead to panda-isation. And I'm using "" on that because I'm not convinced that the method they're using to detect reading level is particularly accurate.

its sometimes actually better to write on a basic level that everyone can understand.


Couldn't agree more - in fact I'd say it's almost always better to write on a basic level that everyone can understand.

the reality is this panda update is not about 1 thing and the writers ability to write is not a major factor if it is one at all.


Panda is definitely about more than one thing and I wouldn't assert that the writer's ability is a major factor (I don't have any data to back it up for a start) - however if mass media production methods leave a footprint like this then it's probably best not to dismiss it out of hand.

Robert Charlton




msg:4301773
 9:05 am on Apr 21, 2011 (gmt 0)

I'm thinking that appropriateness of reading level relates to the query... and that reading level is one of many factors that the algo is taking into account.

I'm also thinking that the algo may be close to being self-calibrating, with user behavior being an important factor (considerations go way beyond bounce rate here), so ease of reading may simply be a reflection of what some users want... all other things being equal. Remember... results are now personalized for many users.

On some queries, I'm seeing that some losing sites appear to use language that's too simple... and on other queries losers appear to be using language that's too advanced. What I'm seeing fits with my expectations of what kinds of queries correspond to different subject matter and reading levels.

I also think that Google is in its calibration mode, crunching through massive data-sets, correlating lots of information before making further adjustments to the algo, and that's most likely why we're not immediately seeing site changes having an effect. As I remember, something similar happened after MayDay.

brinked




msg:4301779
 9:22 am on Apr 21, 2011 (gmt 0)

Basically, gone are the days where you can just patch together a quick website with some defacto content and expect it to rank.

You have to take all things into consideration when releasing a new website.

Johnmu just posted on his twitter about making sure your contact forms work properly, I think this is very important as well...make sure your website works. I think a broken website in googles eyes is a big time indicator of poor quality. I think this update was more about the quality of the website itself rather than the actual content.

Maurice




msg:4301806
 11:34 am on Apr 21, 2011 (gmt 0)

Interesting so did the NYT get hit they love to use obsure words I actualy saw them use "otiose" in on article and "roiled" in a headline.

and using a sample of US people (I hope the sample they used to check reults was properly normalised and not just google employyes) to rank English English for readability is probly not the best idea - it will drive the Daily Mail wild.



jmccormac




msg:4301814
 11:46 am on Apr 21, 2011 (gmt 0)

I'm also thinking that the algo may be close to being self-calibrating,
So in an attempt to capture artificial intelligence in an algorithm, Google has gone one better and embedded natural stupidity? Given their efforts with "interest" based advertising, it figures!

Regards...jmcc

netmeg




msg:4301890
 2:38 pm on Apr 21, 2011 (gmt 0)

When they first released the tool, I ran it against a couple dozen or so of my own sites and a few client, and all of them were either basic or intermediate reading level (most were basic) At the time I was mildly insulted, though tedster said that was probably a good thing.

As it happens, none of the sites were pandalized. Related? I have no idea. But the measurement is there for a reason.

walkman




msg:4301942
 4:00 pm on Apr 21, 2011 (gmt 0)

I seriously doubt that Goog opened that pandora's box and penalized people for using what may be considered fancy or simple words. Seriously, seriously doubt it. Just imagine all types of pages /sites people have and how people choose to express themselves. It's not a term paper :)

The contact form one is interesting Brinked, I could see that but then maybe he gave it as webmaster advice.

crobb305




msg:4301957
 4:11 pm on Apr 21, 2011 (gmt 0)

walkman & brinked... The contact form issue is interesting. The form on my pandalized site works properly, but it is provided by a 3rd party service. When you fill out the form you basically go to their server then bounce back to my site for the confirmation page. I wonder if this poses any problems. It's not something I have ever given thought to.

added: On second thought, I use the same 3rd-party service for all my sites, including unpandalized sites, so I rule that out. I guess the contact form issue really only applies to one of my sites that I just discovered is not functioning because the subscription expired, but even that site is unpandalized.

Ok, enough of my rambling. lol.

Planet13




msg:4302042
 6:15 pm on Apr 21, 2011 (gmt 0)

...that good writing often comes down to writing on complex subject matters in a way that idiot's can understand.


Thank You, BenFox.

Many of my pages rank higher than pages on the same subject that have MORE information and MORE original research than mine.

My content, on the other hand, has better structure, less slang, and far less usage of TLAs*

I think that if you have great content but find that your pages are not ranking, a good first step would be to have an editor with experience in print journalism look at those articles and offer suggestions.

A side benefit to having engaging content is that people are much more likely to link to it naturally, hence boosting page rank.

*TLA = "Three-Letter Anagram" I actually know people who use a three-letter anagram for the phrase "Three-Letter Anagram." Don't be one of those people!

vordmeister




msg:4302143
 8:12 pm on Apr 21, 2011 (gmt 0)

Surely TAL would be a three letter anagram :)

Interesting one though - I've noticed that navigation pages with mostly links tend to score advanced on the reading level, and embarrassingly the home page for one of my sites is rated advanced. I had a look at it myself and it's not user friendly so it's on the rewrite list.

Has anyone considered social engineering instead of a Panda algorithm - Google do completely random things to the results and say it's to improve quality on the web. Webmasters respond by improving quality even if they were good already. I know I'm taking a second look.

More likely it's a side effect, but not a bad one.

buckworks




msg:4302194
 9:55 pm on Apr 21, 2011 (gmt 0)

*TLA = "Three-Letter Anagram"


Acronym, not anagram.

dstiles




msg:4302225
 11:02 pm on Apr 21, 2011 (gmt 0)

Does anyone NOT prohibit bot access to contact and other forms? I thought that would be standard practice by competent webmasters. The alternative is a successful scrape of google/etc for forms to spam!

I have all forms prohibited in robots.txt AND a trap for non-browsers on-page that returns a 405. Doesn't kill all form spammers but it certainly reduces the number of attempts!

Now if google is suggesting that a site is demoted because it has a bad form page, what are they using to determine this? It certainly ain't googlebot.

enigma1




msg:4302394
 9:21 am on Apr 22, 2011 (gmt 0)

Johnmu just posted on his twitter about making sure your contact forms work properly, I think this is very important as well...

Yes it is important but a search engine cannot tell - and won't even try - to process a form via /POST. It's up to the site owners to ensure this. Therefore I don't see how it will affect rankings from that angle.

Restricting bots completely from accessing pages with forms now is a different matter. I would think it all comes down to how the navigation is setup and what the site's owner wants to expose.

For example if the form is linked via js then a search engine may not be able to access it but I don't see why it will affect ranking in some way. Looks to me more of a suggestion than anything else.

@dstiles, the problem with sending headers to spiders restricting access is that they may not always access the robots.txt to determine whether to visit a particular page.

If for example I post a hard-coded link to your form page on my domain the spider will follow from what I have seen. It won't check the robots file in advance. I don't know in such cases if and how the page ranking is affected. I just assume it considers it public because others are linking to it. And other search engines do the same.

helpnow




msg:4302405
 10:11 am on Apr 22, 2011 (gmt 0)

@dstiles Yeah, we block bots from all of our forms, even our search. They cause too many problems.

dstiles




msg:4302593
 5:00 pm on Apr 22, 2011 (gmt 0)

enigma1 - agreed about spiders by-passingrobots.txt: that's why I add an on-page 405.

At least some SEs - google included - submit SOME forms, JS or not. How else could they follow on-page search links?

enigma1




msg:4302599
 5:12 pm on Apr 22, 2011 (gmt 0)

How else could they follow on-page search links?

If it's a /GET form they process it or try to process it, I believe from some old article I remember reading about it. Navigation via <a> links is the main path they follow but of course they do try to process js.

It's bad if they automatically submit forms because even simple search forms waste bandwidth. At least so far from my logs I haven't seen google doing a /POST. That will be real news to me.

So with js what I do to avoid complications and eliminate uncertainty ie what the bot is going to do by parsing the js part of the page is to deploy a js framework (like jquery) where js is totally transparent - not mangled with html. And I only use /POST forms to be sure they won't try anything fancy.

dstiles




msg:4302662
 6:57 pm on Apr 22, 2011 (gmt 0)

You're probably correct about POST but I'm taking no changes. Used to be SEs would not even look at any kind of form: suddenly they were causing all sorts of problems. :(

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved