homepage Welcome to WebmasterWorld Guest from 54.205.236.46
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 33 message thread spans 2 pages: 33 ( [1] 2 > >     
17 out of the top 20 search results are questionable on this search
matze




msg:55407
 7:23 pm on Apr 19, 2003 (gmt 0)

Accidentally I found a German two keyword search phrase that returns 17 times the same site in the top20 for that search term.

Even worse: that site has absolutely nothing to do with the search term. They just stuffed the title and body with tons of keywords, used multiple domains and called their html files keyword1_keyword2.html.

If you click the search result they even redirect you from keyword1_keyword2.html to another page (always the same).

So they just generated tons of nicely optimized pages (titles, body text, filenames...) and redirect all of them to their single spam site where none of your searched keywords exist!

Any opinions on quality of google algo or can someone find even worse search results?

[edited by: heini at 8:33 pm (utc) on April 19, 2003]

 

heini




msg:55408
 8:44 pm on Apr 19, 2003 (gmt 0)

The problem is by now everybody and their grandmother knows exactly how to spam Google.

All ingredients of the algo are on the table.

Google seems to resort to social engineering mostly instead of algo engineering.

Nevertheless, in the grand scheme of things Google is still able to produce good and relevant results. I don't think their algo is poor. It's just starting to get a bit outdated.

Yidaki




msg:55409
 9:03 pm on Apr 19, 2003 (gmt 0)

>It's just starting to get a bit outdated

Hmm ... yah it looks so. The good and the evil - if you do it the "social way" you'll be the looser. That's at least what still everybody's educating today.

matze




msg:55410
 9:24 pm on Apr 19, 2003 (gmt 0)

>The problem is by now everybody and their grandmother knows exactly how to spam Google.

Nobody cares if there are 'a few' spam pages in the top 20 results. But if google returns you 17 out of 20 pages that all point to the same content I wonder if their algo is not even more than just a little bit outdated...

An easy solution to reduce the spam problem very much would be to sort out pages with exactly same content on different domains.
The only thing google would have to do is to cross compare all of their indexed pages and drop duplicate ones.
Maybe that could be a performance problem comparing all of their >3billion pages.

Why not creating a new spam report page which allows the user to enter domains that point to the same content. Would be easy for google to compare those websites and just keep one of the domains in their index.
No misuse possible...

beachlover




msg:55411
 9:31 pm on Apr 19, 2003 (gmt 0)

>The only thing google would have to do is to cross compare all of their indexed pages and drop duplicate ones.

But what if someone sells content. A glossary for example. Someone pays big bucks to have a licensed glossary on his page to add content to his web site. And what does he get in return? An exclusion from Google for duplicate content. Not the best idea, right? ;-)

daroz




msg:55412
 9:41 pm on Apr 19, 2003 (gmt 0)

Someone pays big bucks to have a licensed glossary on his page to add content to his web site. And what does he get in return? An exclusion from Google for duplicate content.

One thing, most sites that use 'licensed' data have the right to put their own header/footer on the page, at a minimum.

An even better example is the ODP -- other then the attribution and the ODP 'box' at the bottom of the results, how you present them is entire your choosing.

I think the 'duplicate content' check can be used to auto-ban sites that have a 'significant percentage' of duplicate content.

For example, if I'm selling 'super low price widgets' and have 50 domains and I make 50 different homepages but copy the 10-15 pages underneith it, exactly the same (mabye except <title>) to all 50 domains, I'm asking to be whacked.

matze




msg:55413
 9:45 pm on Apr 19, 2003 (gmt 0)

>Someone pays big bucks to have a licensed glossary on his page to add content to his web site. And what does he get in return? An exclusion from Google for duplicate content.

Of course that would be bad for the webmaster that buys content :(

But for google as a searchengine it would be great to filter out those duplicate results - they are irrelevant for their users.

For a keyword search I don't want to find 10 pages with same content but 10 different pages which all are about that keyword.

They shouldn't exclude you just for 2, 5 or 100 lines of same content but for identical html code...

toddb




msg:55414
 10:21 pm on Apr 19, 2003 (gmt 0)

I have seen this on some 2 word searches where the top 10 and maybe more are all the same persons site and they do a redirect off of them. The good news is the ones he had last month are all gone. I did see he had a whole new set this month different searchs. The style is very distinct or I would not think it was the same person. He uses 10 diferent domains and depending on his success I would think the work involved would start to out weigh the results. This person is clearly freshbotted in.

beachlover




msg:55415
 10:31 pm on Apr 19, 2003 (gmt 0)

But for google as a searchengine it would be great to filter out those duplicate results - they are irrelevant for their users.

So how should Google determine the original author of content? Letīs say the original is published on day 1 on a non searchable web site and the dublicate gets published on a static page a couple of weeks later. Google will regard the duplicate as the original.

And I would rather prefer to read some press release on the web site of the IRS rather than on the page of some tax advisor who is good in SEO. ;-)

matze




msg:55416
 11:54 pm on Apr 19, 2003 (gmt 0)

>And I would rather prefer to read some press release on the web site of the IRS rather than on the page of some tax advisor who is good in SEO. ;-)

doesn't really matter if it's the same...? ;)

Btw, I wasn't complaining about some duplicate content like press releases, news etc. but about those "100 domains but same page" spammers.

daroz expressed it that way:

For example, if I'm selling 'super low price widgets' and have 50 domains and I make 50 different homepages but copy the 10-15 pages underneith it, exactly the same (mabye except <title>) to all 50 domains, I'm asking to be whacked.

This is what I consider spam and what should be kicked from the index by comparing those pages and dropping duplicate ones.

GoogleGuy




msg:55417
 6:11 am on Apr 20, 2003 (gmt 0)

matze, I'd be curious to know what search phrase you were using. Would you mind dropping a spam report and mention your WebmasterWorld nickname?

matze




msg:55418
 10:54 am on Apr 20, 2003 (gmt 0)

googleguy, I just dropped a spamreport mentioning my nick and ww.

I'd be curious what you think about it: Wouldn't it be pretty easy to drop at least those domains from the index that return absolutely identical content like the ones I sent you?

Marcia




msg:55419
 11:28 am on Apr 20, 2003 (gmt 0)

>>same persons site and they do a redirect off of them

If you're the curious type, check the source code of pages listed using view-source: in IE. I've seen some with redirects that actually have *nothing* on the pages themselves that you'd see if you weren't being redirected. And the back-links are usually interesting to check out, too.

nell




msg:55420
 12:10 pm on Apr 20, 2003 (gmt 0)

Banning duplicate content would be a good way to eliminate competition. Just get a anonymous throw away domain, paste/copy, and smile.

heini




msg:55421
 12:17 pm on Apr 20, 2003 (gmt 0)

>The only thing google would have to do is to cross compare all of their indexed pages
In fact I'm sure Google does just that per default.
Obviously Google has all the data stored and at the ready.

matze




msg:55422
 1:11 pm on Apr 20, 2003 (gmt 0)

Banning duplicate content would be a good way to eliminate competition. Just get a anonymous throw away domain, paste/copy, and smile.

Yep, I agree with that. Maybe that's the biggest problem for google...

They use this javascript code to redirect you from their hundereds se optimized keyword stuffed pages:

<script language="JavaScript">
var stra="windo";
var strb="w.loc";
var strc="ation=";
var strd="'htt";
var stre="p://ww";
var strf="w.spammy_domain.de/<word>/'";
eval(stra+strb+strc+strd+stre+strf);
</script>

Very simple but google can't do anything as long as they don't follow javascript...

[edited by: ciml at 12:48 pm (utc) on April 21, 2003]
[edit reason] Anonymised. [/edit]

chiyo




msg:55423
 5:39 pm on Apr 20, 2003 (gmt 0)

"google can't do anything as long as they don't follow javascript... "

It still wont pass a spam report with a follow up hand check.

Namaste




msg:55424
 6:37 am on Apr 21, 2003 (gmt 0)

The problem is by now everybody and their grandmother knows exactly how to spam Google

but they don't survive very long. This site in question in this thread must be a absolute beginner.

jrobbio




msg:55425
 6:52 am on Apr 21, 2003 (gmt 0)

Banning duplicate content would be a good way to eliminate competition. Just get a anonymous throw away domain, paste/copy, and smile.

Wouldn't it be pretty hard if the source code as well protected?

Powdork




msg:55426
 7:12 am on Apr 21, 2003 (gmt 0)

How do the pages look when js is disabled? Is Googlebot seeing duplicate content?

globay




msg:55427
 7:39 am on Apr 21, 2003 (gmt 0)

How long does it take that the site is penalised? I reportet a site before the last update that is using crosslinking and other spammy techniques in the worst possible way, and the site is still up.

matze




msg:55428
 10:14 am on Apr 21, 2003 (gmt 0)

How do the pages look when js is disabled? Is Googlebot seeing duplicate content?

With javascript disabled the user sees what googlebot sees: hundreds of optimized plain html pages each stuffed with keyword combinations - but without any content.

How long does it take that the site is penalised?

Reported on 04/20/03 ... I'll tell you if/when they got penalised.

JonB




msg:55429
 11:36 am on Apr 21, 2003 (gmt 0)

I must say that today I was dissapointed with google results like matze. I wanted to get more articles about singer <snip> and google was TOTAL disapointment - cracks and serial sites were dominating top 50 results - i didnt check much furhter! few of the resutls were 2 articels on various sites! all in all - total spam.I wasted 20 minutes of clicking on spam sites and closing pop ups!

When i was about to give up I suddenly rememebered Brett praising teoma and I tried to search for <snip> there and ,big suprise, i could not find a single spam site!wow. give it a try,see for yourself.

although google is still my favourite SE (but now i begin to wonder if that is becasue i ONLY used them and have no comparison with other SE- when i didnt find it in google i just didnt search anymore. How many things did i miss with my "blind trust" to google? :) i have to say that from todays experience my eyes are a lot more open!i am sad to see my favourite SE victim of such spam tactics.

[edited by: JonB at 12:20 pm (utc) on April 21, 2003]

[edited by: Marcia at 7:43 pm (utc) on April 21, 2003]

JonB




msg:55430
 11:42 am on Apr 21, 2003 (gmt 0)

var strf="w.spammy_domain.de/<word>/'";

matze,that is EXACTLY the same code/site i got! german domains and "<snip>" in url andafter reading your first post again i see that the same "guys" spamed "my" keywords too! someone found a hole in google alghorith i guess.let us hope they will catch them soon.

[edited by: ciml at 2:01 pm (utc) on April 21, 2003]
[edit reason] Anonymised. [/edit]

vincevincevince




msg:55431
 12:25 pm on Apr 21, 2003 (gmt 0)

Those results are absolutely awful, I agree... even with " " around it, they are a mess. <snip> needs to take legal action against these sites abusing his name.

[edited by: Marcia at 7:44 pm (utc) on April 21, 2003]
[edit reason] Specific removed. [/edit]

JonB




msg:55432
 12:35 pm on Apr 21, 2003 (gmt 0)

vince,is this even possible(legal action)? i would say that majority of celebrities are abused in this way ,also how to prove that it is "YOU" who is abused and not someone else with the same name:)

heini




msg:55433
 12:38 pm on Apr 21, 2003 (gmt 0)

>legal action
Hmm, obviously it's possible for search engines to deliver relevant results, see JonB's experience with Teoma.

vincevincevince




msg:55434
 1:06 pm on Apr 21, 2003 (gmt 0)

and not someone else with the same name

good point, but the sites which came up in that search look the same kind to get jittery about a threatening letter from a lawyer, without actually having to go to court...

matze




msg:55435
 6:48 pm on Apr 21, 2003 (gmt 0)

JonB, yes they are the same pages!

They are doing a "affiliate spam program", have lots of domains, hundreds of keywords and probably thousands of visitors :(

Very funny thing:
The guy who is responsible for those spam sites (name, adress and even phone numer listed on every single page) even offers a software <snip> which generates doorway pages interlinked with each other, seems as he's using his own software to spam google ;)

[edited by: ciml at 7:08 pm (utc) on April 21, 2003]
[edit reason] Let's keep it general. [/edit]

markusf




msg:55436
 7:22 pm on Apr 21, 2003 (gmt 0)

I've seen affiliate programs in addwords.. And some even say "affialiate" in the description of the add. If affilates in SERPS are spam wouldn't this also be spam?

This 33 message thread spans 2 pages: 33 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved