It will be interesting to see how this changes SEO. It may mean content is king after all.
For info, this is the link to the original "Combating Web Spam with Trustrank" report
I believe that the significant thing is that Trustrank recognises that manual intervention is necessary. In their case this is to determine the seed sites, but it has been apparent for the last three or four years that manual intervention will eventually be required to clean up the results. It's now a question of when not if.
One thing I am not sure about is ...
|While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. |
I have always believed that a small team from the each of the main SEs working full time on this could make a huge impact. If an instant lifetime ban was imposed on sites that were in blatant contravention of their guidelines the spammers would soon realise that this was too risky to be a good business model.
The algorithm, that Google used the last years, is already closer to the TrustRank algorithm than to the original PageRank algorithm.
|If an instant lifetime ban was imposed on sites that were in blatant contravention of their guidelines the spammers would soon realise that this was too risky to be a good business model |
Unfortunately there's no risk in getting banned because they don't lose anything. What is needed is a way of stopping them making money in the first place - that would make the business model more risky. The sandbox effect has gone some way to do this, but it's not like spammers are gonna just turn round and say "gosh darnit" and walk away! ;) They learn and adapt too! :)
I notice one of the authors of the paper is a Yahoo employee, and doesn't this mesh nicely with Yahoo's patent document referring to "concept networks"?
"Unfortunately there's no risk in getting banned because they don't lose anything."
Exactly, domains are cheap-hosting is cheap. I expect spam & spammers to disappear about the same time viruses & virus writers disappear.
|Unfortunately there's no risk in getting banned because they don't lose anything. |
I think you may be missing my point. What the spammers "lose" is not the issue here. Their aim in life is to appear at or near the top of the SERPs. This makes them VERY easy to find and deal with (for humans.)
The more clever they are the easier they are to find.
"For info, this is the link to the original "Combating Web Spam with Trustrank" report
i guess this agains the TOS of ww puting a link and advertising 3 guys opinions (that probably want to earn some $ )
I agree the www has to fight gambling and porno spam but that is all,Just to Get rid off sex ,viagra,begging and gambling spam from the net.All other topics has any right to use SEO tactics.
does anybody can tell me what is the BIIIGEST industry on the NET? I got the answer a few years ago from my IT tutor(now i know ..do you know?)
Waiting your answers......
|All other topics has any right to use SEO tactics. |
You really only have the right to use SEO tactics within the SE's guidelines. You can spam all you like until you get caught. If you do and get banned then tough cookie! You must accept that it's your own fault.
>does anybody can tell me what is the BIIIGEST industry on the NET
>does anybody can tell me what is the BIIIGEST industry on the NET
I think it's porn and casino. :>
I haven't yet found a good resource that breaks TrustRank down and explains it. Anyone want to post a link or message it to me?
>does anybody can tell me what is the BIIIGEST industry on the NET
pills, porn, casinos..
The ingenuousness of TrustRank is its reliance on human reviews in its core. I can envision that for a small fee (similar to yahoo’s $299) one would be able to invoke “Oracle” to re-evaluation one's web site TrustRank. That may serve as an appeal process for wrongly positioned sites – something I believe we currently lack.
I gather some of the "seed" sites will be human-compiled directories.
this could explain why Google started posting job openings for quality check people, if I remember correctly.
The Google job openings could also have been related to AdWords/AdSense....
|The Google job openings could also have been related to AdWords/AdSense.... |
Google was advertising for search and AdSense quality evaluators a while back on Google.com. (Separate jobs, separate postings.)
TR has been already at least partially implemented for some months. The propaganda "a DMOZ link is a link like any other" is inane and only serves to reduce the spam pressure on the ODP - of course with minimal success.
If they want to combat spam, I hope they will not repeat an old error and integrate a TR bar in future generation toolbars.
|domains are cheap-hosting is cheap. I expect spam & spammers to disappear about the same time viruses & virus writers disappear |
That's why new sites may rank worse than old sites. That's the reason for sandbox. They prefere to lose some new sites with good content than have more throw-away domains in the results.
I expect, that Trust Rank will also depend on the age of a site and new sites will have to wait several months before getting it.
It's not nice, I have some sites I launched recently and they have to wait before their time comes. But I launched them some months earlier to give them more time, and the will be finally designed when they will be about six months old, before that, I don't need them to rank well. That's the price of fighting throw-away domains and other forms of spam.
Is there an alternative? Perhaps to make a law enforcing registration of all domains owners and allowing to sue them for spamming? But do we want such a law?
In days of Yore when AltaVista was the ne plus ultra and Google was but an excuse for a couple of postgrads to drink coffee rather than finish an overdue PhD write-up, I used to expect to have to wait at least 6 months to reliably see any new page I put up appear in a search engine and used to plan ahead accordingly.
Thus it seems as if a sandbox scheme simply rewards the patient and cools the heels of the heels and the feckless.
"Time wounds all heels" - Jane Sherwood Ace (1905-1974)
If so, good.
Could be a good thing ... and something for people in the sandbox to have hope in.
|In days of Yore when AltaVista was the ne plus ultra and Google was but an excuse for a couple of postgrads to drink coffee rather than finish an overdue PhD write-up, I used to expect to have to wait at least 6 months |
Exactly! I was doing my first SEO work in those times, and it was always natural for me that major SE doesn't rank anything too fast. I remember when Altavista Basic Submit was delayed - the crawl happened a few months after doing a submit - to make people paying for Express Inclusion, and I usually submitted sites in advance to make-up this delay.
But I didn't expect sites to rank in top of major SE in first months - it was possible only in minor, local SEs (which haven't changed since those times and I always have #1 in them, because their algo's archaic - but almost noone uses them ;).
Today, you can add site to Google almost instantly. I put a link on one of my high PR pages, and get new site indexed in days. But still, it seems logical to me that if the site is a few months old, it will not be likely to be a serious source of information - what percentage of sites survives after a year? Google gives a chance to these sites, because it indexes them after all. But if such a site builds great score of inbounds in 3 months - it's logical for G to apply a sandbox.
I wonder if a kind of TrustRank is already in use - if a new site acquires many inbounds, but including one link from DMOZ, and several links from top authority sites in certain subject, will it be sandboxed equally to a site that acquired many inbounds from off-topic non-authority sites?
I wouldn't be surprised if Google already assigned a kind of TrustRank to DMOZ - and I hope they won't assign to much of it to A... let's say, to certain established commercial site ;))
You should read Advogato's [advogato.org] page on the Trust Metric [advogato.org]. Basically, each Advogato user rates some of the other users, but each user's rating is based on how highly he himself is rated.
I'd be interested to know if this could be used as prior art to bust the patent.
Advogato's ranking is how highly esteemed on is considered to be as a free software programmer, but it is meant to serve as a testbed for algorithms that someday might be used to fight spam.
So nobody's addressed what seems to me a major issue with the paper: the authors discuss the idea of "unreferenced pages" and "non-referencing pages." This is a key concept, defined early on in the paper. However, as any frequent user of the Internet will note, very, very few pages fall into either of these camps.
Standard site navigation exists on every page I have ever seen, basically. Every page links to SOMETHING. A link back home, a link to the other sections or whatever. (Fine -- I haven't exhaustively studied this, but my point is clear, I hope?)
However, just before discussing the unreferenced and non-referencing pages, the paper says "we also remove self hyperlinks." Does that mean links to other pages within the same domain? They state clearly that they are not analyzing sites, but individual pages. So the assumption I had made was that "self hyperlinks" are the internal anchor tags that move you around a single page.
So... Does this paper assume the ability to detect and discount 'standard' site navigation? Is it by domain, or is there some other process at work (like MSN's discussion of page zones)?
Anybody else thinking about this?
And does it seem a little unsophisticated to assign pages a simple "spam" or "not spam" attribute?
I guess the TrustRank, as it spreads out across sites touching the "seed" sites will have the effect of assigning partial 'spam' values to pages.
Still, in the initial seed set, I think a black-or-white call is crude. This is a subjective thing, unless we have clear enough rules that really, we ought to be able to program a computer to make the determination iteslf anyway. Barring that level of specificity in the determinations the human "experts" make, a binary decision is not as accurate as would be a finer instrument. (A 1-10 or 1-100 or something.)
Even still, we should be asking, "What are the experts looking for?" Are they reading the content? Are they analyzing subdomains, or looking at the use of alt tags? Surely they're not just tuning into astral vibrations and making these decisions.
Wow. One more thought.
If news organization A posts a story about fraudulent, spammy news organization B, including a link, site B gets a boost in their TrustRank.
As more people start talking about how bad site B is, site B keeps getting more trustworthy.
This has always been a potential problem with PageRank -- a link is not necessarily a vote FOR the target site -- but consolidating influence among fewer sites increases the potential for mistakes like this.
Stupid suggestion to alleviate this: apply the "nofollow" attribute in cases like this as well, and teach everyone to use it when they don't want to help the sites they're linking to.
If you really think about PageRank and TrustRank they are based on the same theory of creating value - Pages voting for other pages.
This sounds good on paper, but execution is another thing. There are so many rules that need to be created that the original idea becomes blurred. It is very easy to game once the rules are known.
I've gotten blasted for this before, but probably the best way to quality results in a search engine is creating a directory. If I were Google, this is what I would do...
Start with the Dmoz directory since there has been a degree of quality assurance already applied to that directory. This is your basline of TrustRank. Just by being in the directory you get points let's say TrustRank 3. Certain categories qualify for up to a TrustRank 10 rating - if they gain points via a qualitative test described below. These are the "standard" websites such as Yahoo, MSN, WSJ and the like... They do not have to pay for this evaluation.
Websites in lessor categories, can for a fee, have their sites manually evaluated for Publically Known quality standards. This allows webmasters to conform to the quality rules if they choose (no black box). This includes things like:
- Site organization
- Load times..
Whatever is attractive to people when searching. Each quarter a nominal fee is paid (one that just covers the cost of Google to peform the evaluation) in order to qualify for additional points - up to TrustRank 10.
My suggested fee is $25 / quarter for the evaulation for added TrustRank points. The evaluation is performed quarterly on an unknown date so that gaming is less possible.
As is the case with PageRank today, this measure of TrustRank is just one consideration when returning results.
My 2 cents
|Each quarter a nominal fee is paid (one that just covers the cost of Google to peform the evaluation) in order to qualify for additional points - up to TrustRank 10. |
PFI may be practical for commercial sites, but what about the vast numbers of .edu, .org, .gov, open-source, and labor-of-love sites that wouldn't shell out for (and probably wouldn't be aware of) a fee-based QC program?
| This 53 message thread spans 2 pages: 53 (  2 ) > > |