Welcome to WebmasterWorld Guest from 54.147.189.54

Message Too Old, No Replies

Reading Level and Panda

     
2:12 pm on Apr 19, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 3, 2010
posts:88
votes: 0


I was having a poke around at the sites on the Sistrix winners and losers list for Panda#2 when I stumbled across this.

It seems (and this was only for a very limited number of sites) that sites which got Panda'd are more likely to have more content which is of an advanced reading level.

To see what I mean pick out a site that got hit, turn on the reading level filter in advanced search options to "annotate results with reading levels" and type "site:url.com" into the search box.

I did this for 30 losers and 15 winners from the Sistrix data as well as 15 winners and 15 losers from Greenlight data. In both cases the losers had a higher average % of advanced pages.

For Sistrix data 7.7% of pages from losers were of an advanced reading level compared to 3.4% for winners.

Greenlight was about the same.


I'm just putting this out there in case anyone wants to look into it in more detail. There are several possible reasons not to draw too many conclusions from my findings -

1- it's a tiny sample.
2- there might be unknown quantities in the reading age algo
3- it could just be that content farms are more likely to have copied/spun/rewritten advanced content than other sites.
4- it could be the lack of editorial control on content farms means they don't have a house style so are more likely to end up with advanced content.

Basically don't go causing a panic by assuming that this is correct. It's probably just a blip.
7:28 pm on Apr 19, 2011 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3080
votes: 67


My research over the last few weeks also found a loose correlation between reading level and Panda. I don't think it is a direct cause and effect but it just might help uncover the key parts behind Panda. It is definitely something worthy of more discussion and research.
7:50 pm on Apr 19, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12671
votes: 142


You'd think there'd be some reason why they made it a search tool option - I mean, why bother if they weren't going to use it for something?
7:58 pm on Apr 19, 2011 (gmt 0)

Preferred Member

joined:Mar 20, 2011
posts:544
votes: 0


I'd actually been wondering the same, based purely on the eyetest of looking at these "don't fit the mold" sites over the past couple of weeks. I didn't know there was a tool to actually test it.

But I find it hard to believe that anything on eHow or with Demand Media is written at an advanced level.

I do think a lot of the sites in the "don't fit the mold" thread we started are written at a far more advanced level than their competitors though.
8:00 pm on Apr 19, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 3, 2002
posts:2575
votes: 0


For Sistrix data 7.7% of pages from losers were of an advanced reading level compared to 3.4% for winners.


I used a similar tool that scored complexity (reading level) from 1 to 20. My Pandalized site scored higher (more complex) than most of those that were well-ranking. The free-version I used just looked at about 10 pages.

It's interesting you mention this, again another of many possible factors, but it is something that I noticed a few days ago when I was doing my analyses. It's hard for me to write on a different level. I'm not sure what I can do to improve, but it's something I am looking into.
8:23 pm on Apr 19, 2011 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:6717
votes: 230


Levels are "relative" ( yes I know that at first glance that would appear to be obvious, but think about it, they vary with culture and geography )..and also reading levels, and particularly comprehension levels, have been falling in the "West" for years.

Grammatical structure appears to be "optional", with bad grammar and spelling accepted as the "norm",( their, there and they're and also you are, your and you're, appear to be taken as interchangeable on these fora as elsewhere ) ..much to the irritation and chagrin of some of us.

But..in the light of such degradation..

It would not be surprising if Google testers "disliked" what some of them could not understand, due to lack of their own vocabulary ..and if those "dislikes" became folded into Panda in part.

I don't approve in anyway of such "dumbing down"..it bodes badly for our societies ..but if "reality" TV can flourish, and semi literates can run for highest office in western developed countries, and actually attract mass following ! Well, maybe the search engines are reflecting that Rome is being over run ..from the inside, by the self created barbarians :((..

As netmeg says it would be strange if it ( reading level ) wasn't there for a reason ..but IMO it is a very blunt tool..and one which should not be used for general assessment of sites when basic serps are created for presentation ..it should be an end user only modifier.
7:43 am on Apr 21, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 3, 2010
posts:88
votes: 0


@shatner
But I find it hard to believe that anything on eHow or with Demand Media is written at an advanced level.


There's a difference between being well written and requiring an advanced reading level. If you do the reading level site: search on a major newspaper you'll probably find that most of the content has an intermediate reading level according to Google.


@Leosghost
It would not be surprising if Google testers "disliked" what some of them could not understand, due to lack of their own vocabulary ..and if those "dislikes" became folded into Panda in part.


I think you've hit on a far larger trend than just Panda. If you study a creative writing course in the UK (and probably in the US as well) one of the first thing's you're taught is that good writing often comes down to writing on complex subject matters in a way that idiot's can understand. That's why so many of the most popular books have an "accessible" reading level(think Da Vinci Code).
7:56 am on Apr 21, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 13, 2007
posts: 607
votes: 0


this claim is totally bogus. If it was true, half my sites written by non native english writers would have been pandalized as well. Just because an article was not written by someone with a PHD means its not useful?

There are tons of USEFUL content that is not textual. Some people are just not advanced, its sometimes actually better to write on a basic level that everyone can understand. There are too many poorly written sites still ranking to even think google uses this as a major factor.

Google wants us to believe they are way more advanced than they really are, but the reality is this panda update is not about 1 thing and the writers ability to write is not a major factor if it is one at all.
8:27 am on Apr 21, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 3, 2010
posts:88
votes: 0


Just because an article was not written by someone with a PHD means its not useful?


That's not what I said. If anything I said that an increased number of "advanced level" pages could lead to panda-isation. And I'm using "" on that because I'm not convinced that the method they're using to detect reading level is particularly accurate.

its sometimes actually better to write on a basic level that everyone can understand.


Couldn't agree more - in fact I'd say it's almost always better to write on a basic level that everyone can understand.

the reality is this panda update is not about 1 thing and the writers ability to write is not a major factor if it is one at all.


Panda is definitely about more than one thing and I wouldn't assert that the writer's ability is a major factor (I don't have any data to back it up for a start) - however if mass media production methods leave a footprint like this then it's probably best not to dismiss it out of hand.
9:05 am on Apr 21, 2011 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:11311
votes: 163


I'm thinking that appropriateness of reading level relates to the query... and that reading level is one of many factors that the algo is taking into account.

I'm also thinking that the algo may be close to being self-calibrating, with user behavior being an important factor (considerations go way beyond bounce rate here), so ease of reading may simply be a reflection of what some users want... all other things being equal. Remember... results are now personalized for many users.

On some queries, I'm seeing that some losing sites appear to use language that's too simple... and on other queries losers appear to be using language that's too advanced. What I'm seeing fits with my expectations of what kinds of queries correspond to different subject matter and reading levels.

I also think that Google is in its calibration mode, crunching through massive data-sets, correlating lots of information before making further adjustments to the algo, and that's most likely why we're not immediately seeing site changes having an effect. As I remember, something similar happened after MayDay.
9:22 am on Apr 21, 2011 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 13, 2007
posts: 607
votes: 0


Basically, gone are the days where you can just patch together a quick website with some defacto content and expect it to rank.

You have to take all things into consideration when releasing a new website.

Johnmu just posted on his twitter about making sure your contact forms work properly, I think this is very important as well...make sure your website works. I think a broken website in googles eyes is a big time indicator of poor quality. I think this update was more about the quality of the website itself rather than the actual content.
11:34 am on Apr 21, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:July 22, 2010
posts:132
votes: 0


Interesting so did the NYT get hit they love to use obsure words I actualy saw them use "otiose" in on article and "roiled" in a headline.

and using a sample of US people (I hope the sample they used to check reults was properly normalised and not just google employyes) to rank English English for readability is probly not the best idea - it will drive the Daily Mail wild.


11:46 am on Apr 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2415
votes: 24


I'm also thinking that the algo may be close to being self-calibrating,
So in an attempt to capture artificial intelligence in an algorithm, Google has gone one better and embedded natural stupidity? Given their efforts with "interest" based advertising, it figures!

Regards...jmcc
2:38 pm on Apr 21, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12671
votes: 142


When they first released the tool, I ran it against a couple dozen or so of my own sites and a few client, and all of them were either basic or intermediate reading level (most were basic) At the time I was mildly insulted, though tedster said that was probably a good thing.

As it happens, none of the sites were pandalized. Related? I have no idea. But the measurement is there for a reason.
4:00 pm on Apr 21, 2011 (gmt 0)

Senior Member

joined:Dec 29, 2003
posts:5428
votes: 0


I seriously doubt that Goog opened that pandora's box and penalized people for using what may be considered fancy or simple words. Seriously, seriously doubt it. Just imagine all types of pages /sites people have and how people choose to express themselves. It's not a term paper :)

The contact form one is interesting Brinked, I could see that but then maybe he gave it as webmaster advice.
4:11 pm on Apr 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 3, 2002
posts:2575
votes: 0


walkman & brinked... The contact form issue is interesting. The form on my pandalized site works properly, but it is provided by a 3rd party service. When you fill out the form you basically go to their server then bounce back to my site for the confirmation page. I wonder if this poses any problems. It's not something I have ever given thought to.

added: On second thought, I use the same 3rd-party service for all my sites, including unpandalized sites, so I rule that out. I guess the contact form issue really only applies to one of my sites that I just discovered is not functioning because the subscription expired, but even that site is unpandalized.

Ok, enough of my rambling. lol.
6:15 pm on Apr 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member planet13 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 16, 2010
posts:3796
votes: 28


...that good writing often comes down to writing on complex subject matters in a way that idiot's can understand.


Thank You, BenFox.

Many of my pages rank higher than pages on the same subject that have MORE information and MORE original research than mine.

My content, on the other hand, has better structure, less slang, and far less usage of TLAs*

I think that if you have great content but find that your pages are not ranking, a good first step would be to have an editor with experience in print journalism look at those articles and offer suggestions.

A side benefit to having engaging content is that people are much more likely to link to it naturally, hence boosting page rank.

*TLA = "Three-Letter Anagram" I actually know people who use a three-letter anagram for the phrase "Three-Letter Anagram." Don't be one of those people!
8:12 pm on Apr 21, 2011 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:Mar 12, 2004
posts:481
votes: 12


Surely TAL would be a three letter anagram :)

Interesting one though - I've noticed that navigation pages with mostly links tend to score advanced on the reading level, and embarrassingly the home page for one of my sites is rated advanced. I had a look at it myself and it's not user friendly so it's on the rewrite list.

Has anyone considered social engineering instead of a Panda algorithm - Google do completely random things to the results and say it's to improve quality on the web. Webmasters respond by improving quality even if they were good already. I know I'm taking a second look.

More likely it's a side effect, but not a bad one.
9:55 pm on Apr 21, 2011 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 9, 2001
posts:5609
votes: 19


*TLA = "Three-Letter Anagram"


Acronym, not anagram.
11:02 pm on Apr 21, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


Does anyone NOT prohibit bot access to contact and other forms? I thought that would be standard practice by competent webmasters. The alternative is a successful scrape of google/etc for forms to spam!

I have all forms prohibited in robots.txt AND a trap for non-browsers on-page that returns a 405. Doesn't kill all form spammers but it certainly reduces the number of attempts!

Now if google is suggesting that a site is demoted because it has a bad form page, what are they using to determine this? It certainly ain't googlebot.
9:21 am on Apr 22, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 30, 2007
posts:1394
votes: 0


Johnmu just posted on his twitter about making sure your contact forms work properly, I think this is very important as well...

Yes it is important but a search engine cannot tell - and won't even try - to process a form via /POST. It's up to the site owners to ensure this. Therefore I don't see how it will affect rankings from that angle.

Restricting bots completely from accessing pages with forms now is a different matter. I would think it all comes down to how the navigation is setup and what the site's owner wants to expose.

For example if the form is linked via js then a search engine may not be able to access it but I don't see why it will affect ranking in some way. Looks to me more of a suggestion than anything else.

@dstiles, the problem with sending headers to spiders restricting access is that they may not always access the robots.txt to determine whether to visit a particular page.

If for example I post a hard-coded link to your form page on my domain the spider will follow from what I have seen. It won't check the robots file in advance. I don't know in such cases if and how the page ranking is affected. I just assume it considers it public because others are linking to it. And other search engines do the same.
10:11 am on Apr 22, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Sept 30, 2006
posts:332
votes: 0


@dstiles Yeah, we block bots from all of our forms, even our search. They cause too many problems.
5:00 pm on Apr 22, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


enigma1 - agreed about spiders by-passingrobots.txt: that's why I add an on-page 405.

At least some SEs - google included - submit SOME forms, JS or not. How else could they follow on-page search links?
5:12 pm on Apr 22, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 30, 2007
posts:1394
votes: 0


How else could they follow on-page search links?

If it's a /GET form they process it or try to process it, I believe from some old article I remember reading about it. Navigation via <a> links is the main path they follow but of course they do try to process js.

It's bad if they automatically submit forms because even simple search forms waste bandwidth. At least so far from my logs I haven't seen google doing a /POST. That will be real news to me.

So with js what I do to avoid complications and eliminate uncertainty ie what the bot is going to do by parsing the js part of the page is to deploy a js framework (like jquery) where js is totally transparent - not mangled with html. And I only use /POST forms to be sure they won't try anything fancy.
6:57 pm on Apr 22, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3091
votes: 2


You're probably correct about POST but I'm taking no changes. Used to be SEs would not even look at any kind of form: suddenly they were causing all sorts of problems. :(
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members