Forum Moderators: open
Even worse: that site has absolutely nothing to do with the search term. They just stuffed the title and body with tons of keywords, used multiple domains and called their html files keyword1_keyword2.html.
If you click the search result they even redirect you from keyword1_keyword2.html to another page (always the same).
So they just generated tons of nicely optimized pages (titles, body text, filenames...) and redirect all of them to their single spam site where none of your searched keywords exist!
Any opinions on quality of google algo or can someone find even worse search results?
[edited by: heini at 8:33 pm (utc) on April 19, 2003]
All ingredients of the algo are on the table.
Google seems to resort to social engineering mostly instead of algo engineering.
Nevertheless, in the grand scheme of things Google is still able to produce good and relevant results. I don't think their algo is poor. It's just starting to get a bit outdated.
Nobody cares if there are 'a few' spam pages in the top 20 results. But if google returns you 17 out of 20 pages that all point to the same content I wonder if their algo is not even more than just a little bit outdated...
An easy solution to reduce the spam problem very much would be to sort out pages with exactly same content on different domains.
The only thing google would have to do is to cross compare all of their indexed pages and drop duplicate ones.
Maybe that could be a performance problem comparing all of their >3billion pages.
Why not creating a new spam report page which allows the user to enter domains that point to the same content. Would be easy for google to compare those websites and just keep one of the domains in their index.
No misuse possible...
But what if someone sells content. A glossary for example. Someone pays big bucks to have a licensed glossary on his page to add content to his web site. And what does he get in return? An exclusion from Google for duplicate content. Not the best idea, right? ;-)
Someone pays big bucks to have a licensed glossary on his page to add content to his web site. And what does he get in return? An exclusion from Google for duplicate content.
One thing, most sites that use 'licensed' data have the right to put their own header/footer on the page, at a minimum.
An even better example is the ODP -- other then the attribution and the ODP 'box' at the bottom of the results, how you present them is entire your choosing.
I think the 'duplicate content' check can be used to auto-ban sites that have a 'significant percentage' of duplicate content.
For example, if I'm selling 'super low price widgets' and have 50 domains and I make 50 different homepages but copy the 10-15 pages underneith it, exactly the same (mabye except <title>) to all 50 domains, I'm asking to be whacked.
Of course that would be bad for the webmaster that buys content :(
But for google as a searchengine it would be great to filter out those duplicate results - they are irrelevant for their users.
For a keyword search I don't want to find 10 pages with same content but 10 different pages which all are about that keyword.
They shouldn't exclude you just for 2, 5 or 100 lines of same content but for identical html code...
But for google as a searchengine it would be great to filter out those duplicate results - they are irrelevant for their users.
So how should Google determine the original author of content? Letīs say the original is published on day 1 on a non searchable web site and the dublicate gets published on a static page a couple of weeks later. Google will regard the duplicate as the original.
And I would rather prefer to read some press release on the web site of the IRS rather than on the page of some tax advisor who is good in SEO. ;-)
doesn't really matter if it's the same...? ;)
Btw, I wasn't complaining about some duplicate content like press releases, news etc. but about those "100 domains but same page" spammers.
daroz expressed it that way:
For example, if I'm selling 'super low price widgets' and have 50 domains and I make 50 different homepages but copy the 10-15 pages underneith it, exactly the same (mabye except <title>) to all 50 domains, I'm asking to be whacked.
This is what I consider spam and what should be kicked from the index by comparing those pages and dropping duplicate ones.
If you're the curious type, check the source code of pages listed using view-source: in IE. I've seen some with redirects that actually have *nothing* on the pages themselves that you'd see if you weren't being redirected. And the back-links are usually interesting to check out, too.
Banning duplicate content would be a good way to eliminate competition. Just get a anonymous throw away domain, paste/copy, and smile.
Yep, I agree with that. Maybe that's the biggest problem for google...
They use this javascript code to redirect you from their hundereds se optimized keyword stuffed pages:
<script language="JavaScript">
var stra="windo";
var strb="w.loc";
var strc="ation=";
var strd="'htt";
var stre="p://ww";
var strf="w.spammy_domain.de/<word>/'";
eval(stra+strb+strc+strd+stre+strf);
</script>
Very simple but google can't do anything as long as they don't follow javascript...
[edited by: ciml at 12:48 pm (utc) on April 21, 2003]
[edit reason] Anonymised. [/edit]
How do the pages look when js is disabled? Is Googlebot seeing duplicate content?
With javascript disabled the user sees what googlebot sees: hundreds of optimized plain html pages each stuffed with keyword combinations - but without any content.
How long does it take that the site is penalised?
Reported on 04/20/03 ... I'll tell you if/when they got penalised.
When i was about to give up I suddenly rememebered Brett praising teoma and I tried to search for <snip> there and ,big suprise, i could not find a single spam site!wow. give it a try,see for yourself.
although google is still my favourite SE (but now i begin to wonder if that is becasue i ONLY used them and have no comparison with other SE- when i didnt find it in google i just didnt search anymore. How many things did i miss with my "blind trust" to google? :) i have to say that from todays experience my eyes are a lot more open!i am sad to see my favourite SE victim of such spam tactics.
[edited by: JonB at 12:20 pm (utc) on April 21, 2003]
[edited by: Marcia at 7:43 pm (utc) on April 21, 2003]
matze,that is EXACTLY the same code/site i got! german domains and "<snip>" in url andafter reading your first post again i see that the same "guys" spamed "my" keywords too! someone found a hole in google alghorith i guess.let us hope they will catch them soon.
[edited by: ciml at 2:01 pm (utc) on April 21, 2003]
[edit reason] Anonymised. [/edit]
They are doing a "affiliate spam program", have lots of domains, hundreds of keywords and probably thousands of visitors :(
Very funny thing:
The guy who is responsible for those spam sites (name, adress and even phone numer listed on every single page) even offers a software <snip> which generates doorway pages interlinked with each other, seems as he's using his own software to spam google ;)
[edited by: ciml at 7:08 pm (utc) on April 21, 2003]
[edit reason] Let's keep it general. [/edit]