homepage Welcome to WebmasterWorld Guest from 184.73.52.98
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 42 message thread spans 2 pages: < < 42 ( 1 [2]     
New Patent Application - Spam Detection Based on Phrase Indexing
tedster




msg:3202880
 3:19 pm on Dec 29, 2006 (gmt 0)

Googler Anna Lynn Patterson is credited as the inventor on this new patent application, Detecting spam documents in a phrase based information retrieval system [appft1.uspto.gov], which was filed Jun 28, 2006 and published Dec 28, 2006.

So who is Anna Lynn Patterson? She came to Google from her previous job at archive.org where they reportedly handle 55 billion documents in the index, so she's no stranger to large scale information retrieval. She's also the author of a short article that many may find interesting: Why Writing Your Own Search Engine is Hard [acmqueue.com].

The abstract for the application describes a bird's eye view of the patent:

Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.

Now it's time to study before I comment more - but I wanted to post the news so any interested members also get a chance to read up.

[edited by: tedster at 3:59 pm (utc) on Dec. 29, 2006]

 

Artstart




msg:3210075
 2:26 am on Jan 6, 2007 (gmt 0)

Too many words in the article.

Can anyone who managed to finish write a logical equation?

ron252




msg:3225523
 11:43 pm on Jan 19, 2007 (gmt 0)

First thing if they have gone this route to detect spam then I guess only person they will kick out is genuine person who has high ranking and relevant information which user wants. I guess this whole paper is junk and if they have followed this paper to penalize website or doing -950 rank then they did blunder. So i guess this is the time Yahoo and MSN should step in and create a better search engine because this type of paper and its implemetation will harm google user experience. Hopefully google will revert back their junk implementaion of spam filtering.

webslinger




msg:3225566
 12:55 am on Jan 20, 2007 (gmt 0)

I wonder what Google would do if every webmaster and web site there is got so tired of Google's seventy-eighty percent something hold on search market and constant gaming of us and decided to add

User-agent: Googlebot
Disallow: /

to their robots.txt file.

Of course it would probably be almost as hard as blackmailing the oil companies. But how sweet it would be to turn the tables.

Marcia




msg:3227913
 7:27 pm on Jan 22, 2007 (gmt 0)

There are some parts of this patent that make me suspect that it might have something to do, at least in part, with what's now referred to as the "950 penalty," which seems to be hitting sites that really aren't pulling any fancy tricks at all.

sandyeggo




msg:3228130
 10:45 pm on Jan 22, 2007 (gmt 0)

I have a commerce site. I offer probably 15 products lines, and each one will fit a different thing. For example,
- Long Widgets
- Short Widget
- tall widgets
- wide widgets
- metal widgets
- wood widgets

Etc.

I understand the thory of over optimization, and i would also say thats possible for my site. However we also learned that anchor text is very important if we are to be found in the engines. If I were to list on my page:

Widgets
- Long
- Short
- tall
- wide
- metal
- wood

My anchor text would not be worth much. So where do you draw the line? Is this script / patent going to know that all of those sizes were widgets?
What the problem could be is that if you are dynamic and drill down through the products, and you only happen to show a few items listed on one page, well thats one thing - your term may only appear a few times and I would actually think that was good. However, if in that category I had a large page 25 items to list, and the word "widgets" appeared each time, then I could see it getting caught up in stuffing filters. But not the whole site or the section.
I guess I could write a script that said that if a term was included on my page 10 times already, then to replace it with another term - but what the heck - what next?

Marcia




msg:3228156
 11:04 pm on Jan 22, 2007 (gmt 0)

I'm not sure it's number of occurences, it seems to be focusing more on co-occurence of related phrases. Of course, if there are a number of products with different modifiers and the same keyword that might mean more phrases but it'll take dissecting what's being said in the patent on how phrases are being identified.

It's just some intuitive speculation on my part, but it makes sense and a few of the things mentioned seem to be a tangible reality so it can't hurt to try a thing or two to overcome what's apparently a penalty that's quite possibly phrase specific.

I've done exactly that for a page that's without doubt got that penalty - as white hat as can possibly be, with no tricks or games. I've noted the cache dates and will be watching over the next couple of weeks to see if the "remedy" applied has any effect.

annej




msg:3230674
 9:50 pm on Jan 24, 2007 (gmt 0)

I don't claim to understand it but I did read the patent over. It seems to me there is a very fine line between pages that rank well and pages that are penalized. The very involved phrase calculations are made and a line is drawn. Above the page is fine, below it is penalized.

MHes




msg:3231175
 8:41 am on Jan 25, 2007 (gmt 0)

so we now have to rewrite all our spam.... sigh.

Why do google bother, can they be any bigger? Joe public have loved them for years with our spam dominating the serps. I think they are taking a big risk trying to find naturally written unique content which has a yet unproven effect on their popularity. I hope they reconsider...

stinkfoot




msg:3232180
 11:56 pm on Jan 25, 2007 (gmt 0)

I refute that these people thought of this first. I have posted on here about searching for random phrases to sus out scraping and spam many years ago!

Ok .. well ... looks like I will only be able to surmise about this as the proof of these postings seems to NOT be in the google site search for this site.

now there is a surprise!

zeus




msg:3246369
 2:36 pm on Feb 8, 2007 (gmt 0)

outland88 "attacking spam by quantity and number of domains" owend by one person, I dont like that I got about 20 domains, but that is a must when you have to get a steady income with all whats going on on the internet "google" if google did have so much power we maybe could have less. I do agree if a person have 1000 domains live on the net, that could be spam and all with a 1 year registre.

zeus




msg:3246374
 2:43 pm on Feb 8, 2007 (gmt 0)

If it come to it, we are talking about keyword density nothing els, if you look at it.

Marcia




msg:3248812
 9:14 pm on Feb 10, 2007 (gmt 0)

We're talking about keyword co-occurrence, IDF (Inverse Document Frequency) and levels of threshold acceptability in document collections.

This 42 message thread spans 2 pages: < < 42 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved