I think it would be some what hard for them because there are so many ways people spell things, such as are to r or you to u, color (American) to colour (British), however it could be done to some extent.
I can't see why trying to do so would be a priority for Google?
|Adversity Sure Fire|
I dont think as when we search using a sentence it do not match in, if ....terms...
From my understanding, I read that google was trying to cut down on doorway spam by looking for full sentences.
That being said, would it be safe to say that Google would not be able to catch websites that have pages full of gibberish, but also with their keywords thrown in there a few times?
At least they couldn't be caught solely on not being full sentences, but could maybe detected by other means as spam.
There's no reason Google can't use the same sorts of grammar checkers that Word provides—presumably not deep-sixing you for a single passive sentence.
Tthe reading-level tests might also come in handy. Certainly I'd love it if they used reading-level tests. My space is plagued by high-ranking pages written by primary school students...
Indeed, for a dedicated searcher, reading-level stats would be a very effective way to narrow ones' search (ie., for the "advanced" page). I mean, if you are a college student learning about paleontology (or mummies, or etc. etc.) you are continually being served kid-oriented sites...
There are, however, serious processor time issues when you get into serious semantic parsing. What's trivial for a single document is not necessarily trivial for billions.
The problem is, if we are sure that gramatically correct content is more valuable for search. It may depend on the query, for example, if I search for "yellow widget definition and properties", I expect serious informational site written in decent English, while if I searched for "cheap yellow widgets for sale online shop", I would have to expect a site with a lot of commercial rubbish and hope they give me good prices on yellow widgets.
I develop an informational site, where each widget has its official description, written in as good English as possible, and then there are users' reviews, written in language quality they could afford. But these reviews are very valuable source of information for people who the website is intended to.
If Google can distinct between random generated keyword stuffed spammy phrases and normal English written by human, that's ok, but they'd better don't penalize sites with user-inserted content, which may include gramatical errors.
For example, WebmasterWorld is a valuable resource of information, while many posters (including myself I guess) make many language mistakes. It doesn't mead it's a spam :))
And spam sites can be detected with so many other methods, especially with careful analysis of its outbounds - I don't know if Google does it, but they definitely should.
>>> I've read rumors that googlebot can now check websites for gramatically correct sentences.
Pure rumors, IMO.
Stop words, word stemming, and concept of LSI should be able to help confirm that googlebot won't read sentences gramatically.
|For example, WebmasterWorld is a valuable resource of information, while many posters (including myself I guess) make many language mistakes. It doesn't mead it's a spam happy!) |
But if a page existed that had the same content in better English, then I would consider the grammatically correct one better result and rank it first (in my search engine).
What Google ranks where is their business, but I believe the sites they wish to rank highly would generally include properly formed scentences and minimal spelling errors.
Not stand a chance Webmasterworld would.
Lists are ungrammatical, but they can represent some very useful content in some cases. You may want a list of species of butterfly native to your country, or a list of plumbers in a town. So content written in incomplete sentences does not necessarily indicate that it's for children either.
I can see a use for a grammar filter if you're specifically looking for articles and essays, but it shouldn't be applied to the general results.
|What Google ranks where is their business, but I believe the sites they wish to rank highly would generally include properly formed scentences and minimal spelling errors. |
I just spellchecked something I wrote, and Word found about 20 words it didn't recognise. None of these were errors. Unique names, specialist jargon and obscure vocabulary will be more common in documents with a higher reading age. Then there are local variations in language to account for. I don't believe accurate grammar or spellchecking is possible for a search engine, at least not in the near future.
Could someone hurry up and patent this?
Why not!? :)
After what we've seen from Google and Yahoo.
|I don't believe accurate grammar or spellchecking is possible for a search engine, at least not in the near future. |
Will have to respectfully disagree with you on that. Google has the best data set ever to base spelling and grammar checks on. They're already doing it. Drop or add a letter out of your favorite word, dump the result into Google's search bar. "Did you mean: ____?"
I'm not suggesting they do check for proper grammar or spelling when it comes to ranking pages. I've seen no evidence to support that theory yet, but I've seen no evidence to disprove that theory either so i don't know. But it is another factor. If they're looking at every little piece of information they can to help them rank sites, why shouldn't they consider spelling/grammar? I don't imagine it will ever become a death sentence for your site if you can't spell, but it could/should be a factor. I mean really, if the words in your URL matter, why not this? Neither one carries a lot of weight, but all other things being equal, it's another factor to tip the scales in favour of a better site.
I would not be at all surprised if they implemented a very minimal punctuation and sentence structure test. But I expect that it would be used more to classify the type of information on a page instead of simply using it to rank pages higher.
I asked a question about this of a Google engineer in New Orleans - specifically if their semantic algorithms could identify the difference between casually written language and really worked on, edited and re-edited awesome copy. I know that I can tell the difference but I wondered if it could be automated.
As I recall (it was early in the morning) he said this is something that is being worked with behind the scenes, but it's experimental and not at all ready for prime time. It needs to be a pretty complex algorithm and while it is naturally of serious interest, it's apparently not zapping our pages up or down at the moment.
This was also one of those topics where words were "very carefully chosen" in the answer, and my paraphrase may be off a bit from the actual intended meaning - so take my report with a grain of salt. It's always fun when you ask a search engineer something and they become very attentive to what they can and can't say ;)
Google = "the Forbin Project"
You didn't know that did ya....?
It is interesting when someone goes into the mode where they choose their words very carefully, but you must also be careful yourself in what you interpret that to mean.
The simplest reason for the care in the response is that such a statement is likely to make it onto the boards such as webmaster world, and they understand that the rumors are going to run rampant.
The fact that you got an affirmative answer at all would lead me to take the carefully worded response at face value. It is an area of interest, but it isn't anywhere close to ready for prime time.
One of the issues in this territory is that there are already existing patents for automated language analysis that even precede the web. So I think at least part of the reason for scrupulously chosen words was to avoid even the slightest suggestion of any patent infringement. And I understood that very clearly, no matter how brain dead I may have been at the time - this would be a new venture, and not the application of any existing methods.