Yidaki

msg:225464 | 7:05 pm on Aug 14, 2003 (gmt 0) |
Marcia, i tried to replicate what you noticed using the allinurl search. It turns out that these are not links provided by alltheweb but by a bunch of different domains (17.000+ overall results at google at the moment). All domains are registered by the same person/company. Obviously another search engine results for search engine results spam method. Quite bad since it's caching redirects. Uhhh, weird ... It this what you're talking about? Even better: every result loads as a frameset using one frame to show ads coming from the hijacking domain, and another frame loads the actual redirect's content. Looks like fast allready did something against it - the redirect frame loads a fast error page "Invalid redirect URL" ... <added> It's all coming from one "meta search engine". The search results' listings (in this case fast's redirect url's) are obviously cached, mirrored to tons of domains and made available to google and other crawlers. This is getting a new sport obviously. </added>
|
mcavic

msg:225465 | 2:14 am on Aug 15, 2003 (gmt 0) |
Marcia, I don't know if this answers your whole question, but the robots.txt on alltheweb is missing a line. It disallows /search, but doesn't disallow /urlinfo. Compare these [google.com] two [google.com] Google serps.
|
mcavic

msg:225466 | 2:18 am on Aug 15, 2003 (gmt 0) |
Oh - this [google.com] is what you're talking about, right? Yep, ATW should add /urlinfo to the robots.txt.
|
Yidaki

msg:225467 | 7:27 am on Aug 15, 2003 (gmt 0) |
mcavic, i think the fast search result pages are not the thing what Marcia described. They are not producing google duplicates. Search for allinurl:click.alltheweb.com instead and you'll get the picture ...
|
Marcia

msg:225468 | 9:27 am on Aug 15, 2003 (gmt 0) |
Exactly Yidaki. I just got an email originating out of Yahoo corporate so I did the same type of search on that domain and came up with a few similar, including the allinurl: search result with that Yahoo domain being inserted onto the resulting page. The kicker on this one is that they're running AdSense on the page.
|
Yidaki

msg:225469 | 6:22 am on Aug 22, 2003 (gmt 0) |
>The kicker on this one is that they're running AdSense on the page That's really a kicker. Did you eMail the AdSense team about that?
|
Marcia

msg:225470 | 6:32 am on Aug 22, 2003 (gmt 0) |
>>Did you eMail the AdSense team about that? I didn't even think of it Yidaki, in fact I should have filled in the form but got distracted and forgot. I did send a long email with details to search-quality though to check out what's going on. It was far from a normal thing to be happening.
|
MarkHutch

msg:225471 | 6:57 am on Aug 22, 2003 (gmt 0) |
Marcia, maybe it's my IE setup, but all those links that come up under that type of search, crash my IE browser everytime if I click on them. Very strange.
|
Yidaki

msg:225472 | 6:57 am on Aug 22, 2003 (gmt 0) |
>I did send a long email with details to search-quality That's probably enough to make them aware of the problem. However, it might speed up things to contact the sales people at google - there's money involved ... i received good feedback in the past when i used my AdWords account to send quality complaints to google.
|
Yidaki

msg:225473 | 12:02 pm on Aug 24, 2003 (gmt 0) |
Allthough it's somehow disturbing, i doubt that google will penalize neither the original nor the mirrored pages. However, it could probably cause trouble if google tries to merge the results and keeps the wrong (mirrored) page in its index. If google one day starts to unintentionally penalize or drop the original version of such mirrored pages instead of the redundant mirror, they might end up penalizing a lot of the best known sites - including WebmasterWorld and Google itself. If you search google for exact pages titles or urls from WebmasterWorld and/or google you'll find a lot of high ranked anonymizing proxies that cache every requested page and make them available to robots. Bad side effect of google's improved spidering of dynamic content, imho. They presumalby allready work on a solution at the plex.
|
|