One data-mining guy (,let say Google News coder) can easily identify the other similar contents (scrapped/crawled + remixed by other guy).
It's not a rocket science for CS Engineering graduates.
@surat, to clarify, this is about a a lack of quantity of content, rather than a lack of quality/originality.
It requires some big assumptions to equate low quantity of content with "inadequate".
From the FT article:
|Bill Slawski, search engine marketing consultant at SEO by the Sea, who spotted the Google patent filing this month, said the Google technology could undercut offerings by the new digital media companies. |
A hat tip to Bill and his very informative article [seobythesea.com] about it.
So Goog wants to add to the cesspool?
Further indication that Goog is parsing the data to distraction rather than riding success. Patents do not mean upward, sometimes they indicate fail.
It was a surprise getting a call from Kenneth Li of the Financial Times, but we spent about 40 minutes or so on the phone talking about the Google patents. I asked some questions as well and answering some.
He found out about my post from another journalist who tweeted it, and after reading it brought it to his editor. It seems like this is a topic of particular concern to journalists.
Both he and his editor were wondering if the patent(s) meant that Google might get into the content generation business themselves, or provide a way for web publishers to become a little more competitive against some of the digital media sites like Demand Media that use algorithmic approaches to identify content for informational adsense-type pages.
There are two patents involved, an earlier one that has been granted, and a continuation that adds a little to the first and is still pending.
The patents cover looking at queries that have some baseline level of popularity and the quality and relevance scores of the pages that show up in results for those queries.
When Google identifies a query that has unserved or underserved results, it might look at related terms to see if there's a whole topic that doesn't have adequate results.
The new patent adds a little, including looking at queries where there might be adequate content in one language, but not in others.
The patents present a few different ways that they might inform web publishers of inadequate content, including telling searchers at the time of a search, creating a query and topic search engine, as well as providing the information to advertisers.
As synchronicity would have it, Demand Media had a patent application published today titled Method and System for Ranking of Keywords for Profitability (US Patent Application 20100153391) which might provide a window into the kinds of things they look for when attempting to identify keywords to write about.
Unsurprisingly, those include things like search volume, how much it might cost to bid on the terms, how competitive the terms are.
But, there are some issues with some of the examples they raise on things like how competitive a site might be. For instance, they state in the patent description that they might look at sites that rank highly for a query, and check the Alexa rank for those sites. I don't think that will help much.
There could be some potential benefits for ecommerce as well as content creation/advertising from a system like the one described in Google's patents. If you ran an ecommerce site, and looked up "leather widgets" in an "inadequate topics and queries" search engine, and found that there were a lot of searches for "green leather widget" without any pages that offered them. If you sold blue leather widgets and red leather widgets, would you consider adding green leather widgets to your site?
I wonder if this will help in Q and A websites. The amount of times I've searched for an answer, and just find forums asking the same question without any answers, or worse giving bad advice.
It is amazing what you can get a patent for these days.
Sounds similar to the concept of notability used in wikipedia. Both can be used to identify potential areas for new content development and both can be abused very easily
Wouldn't it make sense for Google to identify areas where there is inadequate content, though? Couldn't that data lead to not what they'll actually do with the data but actually what they'll do to tell Google searchers about the lack of information?
That could lead to them suggesting to the searcher that there isn't enough information about that topic, perhaps they would want to search for something else?
I don't think Google will get into the content generation business, unless that means them doing something with information, such as generate better Google News search results. (Isn't Google News a type of content generation?)
So... does this mean their index is going to be reduced by 99%? or will they still show all the MFA krap they've contributed to create?
I suppose google has a mass of data about what people appear to be searching for.
Then if they tweak their program (quite a lot) they may be able to infer whether the searchers found content that satisfied their curiosity or not, possibly something they would infer from the searchers behaviour.
Then they have what people are seeking and where that is or is not being provided. But what do they do with that information? To whom is it valuable? and why would such a reasonable process be patentable?
They could certainly deduce that there are no satisfactory answers to most of the things I have been searching for over the last week, from the fact that I kept doing lots of variants on the same search repeatedly in a short time.
Correct me if I'm wrong but...
Google is about to announce their entrance into ecommerce and this could be used to identify what they should feature most prominently. It would be the ultimate tool in product research and Google is about to write content for their ecommerce platform.
If that's the case the information derived from this patent needs to be made public and free or not be allowed in use at all.
I think search engines have no business in ecommerce since one platform renders the other biased and/or unfair. Google should not be allowed to self-police their non-use of unfair practices, serps are already stuffed with Google features and the very sites they rank show up below the fold in many instances, tucked underneath maps, youtube videos, paid ads, affiliate offers etc. It's ugly enough already.
I think it's time to unplug Google from the hundreds of places it's creeped into, once they jump into ecommerce every other ecommerce store is at a distinct disadvantage and there are laws against this type of thing. This patent would ensure others can't have the same information without Google permission.
I also thought it was hypocritical of Google to feature affiliate offers on page one while penalizing affiliate sites.
Google bought youtube, youtube videos appear on page one of most serps in a featured format. Google launched maps, maps appear on page one of serps in a featured format. You know where this is heading and Google is about to go ecommerce. This trend is disturbing and clearly Google DOES rank their content without requiring ranking factors to be applied. I had my fingers crossed they didn't replace their affiliate offers with their ecommerce platform like this but even if they don't this patent lets them cherry pick the web in a way I never could (being that it's patented by Google).
Grabbing a copy of every webpage online would no longer be a neutral endeavor if Google then competes against a good many of them and has unfair knowledge of how to send traffic to their own sites. (assuming they don't bypass natural ranking at all which they tend to do)
I don't like how far away from Search Google is getting. Even if the patent is just to inform webmasters of where there is a lack of content that very act alters the web, not something a search engine should be doing if they want to maintain any sort of neutrality.
Pages that would have been will never be if Google is allowed to influence the web by releasing this data and if they go ecommerce they should have to in order to remain fair imo.
I'm getting ahead of myself but the issues are real and history suggests I'll see Google ecommerce offers on page one AND google ecommerce offers on regularly ranked pages that out-SEO all of us. Helpful search could die with this, replaced with search for profit that competes against ecommerce sites instead of helping people find them.
If Google actually issued a report on the topic areas, there could be a veritable gold rush of people moving in to fill the voids.
|while penalizing affiliate sites. |
Sites build from affiliate feeds and duplicate content, most definitely get hurt. Pages that are only about a laundry list of affiliate links, yes. But in both tcases, the "penalty" seems to be more about the low quality content.
The mere presence of an affiliate link does not penalize a good quality site - that's a bit of mythology, as far as I can see.
I wasn't referring to the myth, Google hasn't told us how much affiliate salt we can sprinkle before killing the taste of our sites so how much of a myth it is is undertermined. I was focusing on the history and how it might be repeated.
I'm hoping I don't see Google's own ecommerce links show up on page one of serps with this new patent penalizing others by cherry picking key terms. Page one offers + naturally ranked offer pages using a tool that this patent would help create = not much help to existing ecommerce sites. When Google begins competing instead of being helpful it is no longer neutral, that's my concern.
If Google wants to fight Amazon, fine, as long as they don't feature Google offers on page one unless they are naturally ranked. I was going to wait and see how neutral Google remained with their core search product and their history makes me uneasy about that.
This new patent could conceivably use search to help Google gain an unfair advantage in rankings WHILE competing, that's not trivial imo.
Time will tell how far Google leverages search for their ecommerce business, I'm just sharing my concerns and wondering if they are shared.
One of my recent threads was on spam sites beating Google sites in the SERPS, so they look like they have been neutral so far.
I have seen maps get a special ranking, but not Youtube videos - what I see is a video results section on the SERPS that can have video from other sites, but which gets dominated by Youtube, presumably because Youtube dominates video hosting.
Google Books already carries Google affiliate links.
[edited by: bill at 4:24 am (utc) on Jun 19, 2010]
[edit reason] fixed spelling [/edit]
|graeme_p: Google patents a system for identifying areas where there is inadequate content. |
1. remove template HTML
2. If length of content < N characters - call it "inadequate"
what Mark_A said above....
if you can patent that, you can patent just about anything.
|graeme_p: Google Books already carries Google affiliate links. |
there isn't any conflict of interests here whatsoever (sarcasm).
The patent is not about detecting inadequate content on a particular web page, it's about detecting topical areas with inadequate content anywhere on the entire web. In other words, it's about finding holes in the publicly available information.
Google will misuse this. Guaranteed.
I just get a chuckle out of it... who identifies what content is "inadequate?" That would imply the one checking "knows everything." :)
I don't get what the fuss is. Google already has the data about which queries aren't satisfying searchers. It would be odd for a search engine not to have this. At the least, Google identifies when people keep on searching but stop clicking vs what search refinements searchers try that appear satisfactory.
I assume the patent is codifying Google's methodology, probably so Google's competitors can't prevent them from using their own data. Google, it appears, is inclined to make the data available free to content providers so they can create content to fill those gaps. Ultimately, more opportunities for content providers... more choices for searchers... a more satisfactory searching experience.
|"Google would provide information on topics or queries for everybody who performs search as opposed to [companies that] hire people to mill out videos for $20 per video," he said. |
I suppose you could argue this both ways... content creators who'd noticed such gaps on their own before this information became freely available might get rewarded less for their initiative. But Google might be heading off a lot of mass produced junk by opening those areas up for competition.
|There could be some potential benefits for ecommerce as well as content creation/advertising from a system like the one described in Google's patents. If you ran an ecommerce site, and looked up "leather widgets" in an "inadequate topics and queries" search engine, and found that there were a lot of searches for "green leather widget" without any pages that offered them. If you sold blue leather widgets and red leather widgets, would you consider adding green leather widgets to your site? |
Similarly, if you were the only guy on the web smart enough to feature green leather widgets, you might be unhappy. I assume, though, that "inadequate" might mean that after visiting the one green leather widget store on the web, searchers kept on looking and didn't go back to the first one.
I've used search data to suggest product lines to clients, and they were very happy to have the information. I'm not thinking Google's attempting to put me out of business, though.
Ultimately, the whole Google-ized universe might get so reflexive that there are no new ideas out there... that everyone's found everything, and there are no new places to go. At that point, people might start turning off their computers and going for walks, and that might not be such a bad thing. ;)
After some reflection, a PS to argue the other side of what I said above....
Many niche stores (brick and mortar) that I've liked over the years have been hurt by the expansion of big operations (online and off) into their niches. To the extent that small but high quality operations might be threatened by the availability of competitive data... yes, that is a concern.
I think I'd rather have the data given out freely, though, with the intention of raising overall quality, than to have it used to generate lots of cookie cutter junk.
But I realize that it can cut both ways.