Forum Moderators: Robert Charlton & goodroi
So who is Anna Lynn Patterson? She came to Google from her previous job at archive.org where they reportedly handle 55 billion documents in the index, so she's no stranger to large scale information retrieval. She's also the author of a short article that many may find interesting: Why Writing Your Own Search Engine is Hard [acmqueue.com].
The abstract for the application describes a bird's eye view of the patent:
Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.
Now it's time to study before I comment more - but I wanted to post the news so any interested members also get a chance to read up.
[edited by: tedster at 3:59 pm (utc) on Dec. 29, 2006]
User-agent: Googlebot
Disallow: /
to their robots.txt file.
Of course it would probably be almost as hard as blackmailing the oil companies. But how sweet it would be to turn the tables.
Etc.
I understand the thory of over optimization, and i would also say thats possible for my site. However we also learned that anchor text is very important if we are to be found in the engines. If I were to list on my page:
Widgets
- Long
- Short
- tall
- wide
- metal
- wood
My anchor text would not be worth much. So where do you draw the line? Is this script / patent going to know that all of those sizes were widgets?
What the problem could be is that if you are dynamic and drill down through the products, and you only happen to show a few items listed on one page, well thats one thing - your term may only appear a few times and I would actually think that was good. However, if in that category I had a large page 25 items to list, and the word "widgets" appeared each time, then I could see it getting caught up in stuffing filters. But not the whole site or the section.
I guess I could write a script that said that if a term was included on my page 10 times already, then to replace it with another term - but what the heck - what next?
It's just some intuitive speculation on my part, but it makes sense and a few of the things mentioned seem to be a tangible reality so it can't hurt to try a thing or two to overcome what's apparently a penalty that's quite possibly phrase specific.
I've done exactly that for a page that's without doubt got that penalty - as white hat as can possibly be, with no tricks or games. I've noted the cache dates and will be watching over the next couple of weeks to see if the "remedy" applied has any effect.
Why do google bother, can they be any bigger? Joe public have loved them for years with our spam dominating the serps. I think they are taking a big risk trying to find naturally written unique content which has a yet unproven effect on their popularity. I hope they reconsider...
Ok .. well ... looks like I will only be able to surmise about this as the proof of these postings seems to NOT be in the google site search for this site.
now there is a surprise!