homepage Welcome to WebmasterWorld Guest from 54.234.147.84
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Amazon S3 Forum on Google - Appspot Clone Has the Top 2 Results
sdani




msg:4266387
 12:22 pm on Feb 13, 2011 (gmt 0)

I did a search for "Amazon s3 Forum" and the top two results are a CLONE of Amazon's s3 forum, where the clone is running on a appspot server (which is a Google service).

The URL start's like example.appspot. com / actual amazon s3 forum URL...

 

tedster




msg:4266434
 2:55 pm on Feb 13, 2011 (gmt 0)

Wow. We normally wouldn't run a post about a specific Google ranking - but this for this case we'll make an exception, given who operates the sites.

sdani




msg:4266510
 5:42 pm on Feb 13, 2011 (gmt 0)

thanks for accepting the discussion.
What looks to me is that the clone is being considered the authoritative site, with proper title / summary and the "more links from this site" link.

tedster




msg:4266560
 7:45 pm on Feb 13, 2011 (gmt 0)

Yep, this is clearly a bogus result to me, and a pretty bad one at that. The new "duplicate algo" or "original atttribution" algo - whatever we call it - is still misfiring badly.

aristotle




msg:4266642
 1:45 am on Feb 14, 2011 (gmt 0)

I don't understand how it works. Can new threads be started on both forums? Also, is there a time delay between when a post is made on one forum and when it appears on the other forum.

sdani




msg:4266644
 1:51 am on Feb 14, 2011 (gmt 0)


I don't understand how it works. Can new threads be started on both forums? Also, is there a time delay between when a post is made on one forum and when it appears on the other forum.


I think it's read only clone. If I click on "logon" button on the clone, I get 404. So, new posts can not be posted at the clone.

aristotle




msg:4266650
 2:05 am on Feb 14, 2011 (gmt 0)

So all the real users and all the activty are on the original Amazon forum. Then it's really puzzling how the clone could get the number 1 ranking, especially since the logon link gives a 404.

Robert Charlton




msg:4266659
 3:30 am on Feb 14, 2011 (gmt 0)

Just did a Google search for [google appspot], and the #2 result is a large cluster of Google support forum posts, the first of which is...

My appspot application is forward to Google site, why ?
[google.com...]

Questions are all more or less asking about variations of the symptom described.

The poster on this thread is treating it as if it's a competitor hijacking...

user.appspot.com/mydomain.com ! Some google app outranking/replacing my site!
[google.com...]

Workaround was posted 12/27/09 by Google Employee "Advisor Angela" on this thread...

www mapping stopped working for App Engine
[google.com...]
...Just letting you know that we're aware of an issue where the www mapping suddenly stopped working for App Engine.We're still trying to determine the root cause of this, but we have a workaround in the meantime:

1) Access the settings for the 'Web Pages' service and add a custom URL (anything besides www)

2) Go into the App Engine settings and delete the www.domain.com entry and then add it again. This should allow the app engine service to use www again .

If you have this issue and you have not been using App Engine (or if the instructions above don't work), please post your domain name as well as the target URL and we can investigate further.


I haven't explored all the threads in depth, but issue appears to be unresolved.

sdani




msg:4266740
 10:27 am on Feb 14, 2011 (gmt 0)

Thanks Robert - If that is the case, then Amazon Cloud's support forum is running on Google Cloud :) .. probably outsourced.

sdani




msg:4266741
 10:33 am on Feb 14, 2011 (gmt 0)


Thanks Robert - If that is the case, then Amazon Cloud's support forum is running on Google Cloud happy! .. probably outsourced.


This is not true . PANIC :( .. When I clicked on "more results from this domain".. after the second search.. its showing results from MY domain.. which I know for sure is not running on app engine.. its a simple WHM / cpanel phpbb forum, which is also listed from APPENgine results.

Robert Charlton




msg:4267015
 9:02 pm on Feb 14, 2011 (gmt 0)

If that is the case, then...

sdani - I wasn't suggesting that there was any kind of simple answer on this. If you read the Google Support Forum threads, you'll see that there appear to be a variety of issues, apparently DNS related. I can't comment, because I'm completely unfamiliar with the setup on Google appspot.

I think I can say that the issue goes beyond simply an algorithm that can't handle dupe content. I don't think that sufficiently describes what's happening here. Some webmasters affected apparently can't (or couldn't at the time of posting) access their own sites... which suggests that Googlebot can't either.

tedster




msg:4267050
 10:23 pm on Feb 14, 2011 (gmt 0)

Here are some more pieces of the puzzle. I noticed that the Amazon url is a URL-only listing. That clued me in to check the robots.txt file at Amazon.

In fact, the robots.txt file specifically excludes googlebot (and only googlebot) from crawling the forum.

User-agent: *
Crawl-delay: 10
Disallow: /click.jspa
Disallow: /search.jspa

User-agent: Googlebot
Disallow: /

https://forums.aws.amazon.com/robots.txt

gshannon




msg:4267094
 12:21 am on Feb 15, 2011 (gmt 0)

This issue can be addressed, all site owners should be using similar code to protect themselves from these sorts of attacks - they are not new and have been going on for years.

The way to fix this is to noindex when Googlebot is crawling through a proxy IP that does not reverse to Googles dns.

Example: when Google crawls through a reverse-proxy the reverse-proxy will reveal its IP to your website, then you perform a lookup to see if it really is Googlebot or not.

Here is code for your header:

<?php
// This code should go somewhere at the top of your header
$do_noindex = 0;
if (preg_match('/googlebot/i', $_SERVER['HTTP_USER_AGENT'])) {
$ip = $_SERVER['REMOTE_ADDR'];

$name = gethostbyaddr($ip);
$host = gethostbyname($name);
if($host == $ip && stripos($name, 'googlebot') !== false) {
// valid Googlebot
echo '<!-- Googlebot OK -->';
echo "\n";
} else {
// not actually Googlebot
echo '<!-- Googlebot NOK -->';
echo "\n";
$do_noindex = 1;
}
flush();
}
?>


Here is code for in between your <head></head> tags:

<?php
// This should go in between your head tags

if ($do_noindex === 1) {
print "<meta name='robots' content='noindex'>\n";
}
?>


Optionally add a canonical back to your site in the event it does get indexed somehow:

<?php
if ($do_noindex === 1) {
$_curl = “http” . ((!empty($_SERVER['HTTPS'])) ? “s” : “”) . “://”.$_SERVER['SERVER_NAME'].$_SERVER['REQUEST_URI'];
print "<link rel='canonical' href='" . $_curl . "'>\n";
}
?>


Once installed please perform as fetch as googlebot in your webmaster tools to validate that it is working correctly. You should see "Googlebot ok" in your header.

If anyone would like to know more about these issues or needs help addressing them, contact me.

tedster




msg:4267098
 12:37 am on Feb 15, 2011 (gmt 0)

Welcome to the forums, gshannon, and thanks for the protection code. You're exactly right about how to identify a direct googlebot request compared to one through a proxy.

This case is still rather different, don't you think? The original Amazon forum DISALLOWS googlebot in robots.txt. So they're not getting crawled by Google at all - by intent.

Given that it is the actual support forum Amazon's cloud service - their Simple Storage Service itself - it looks like they don't want Google to rank them. Or maybe the robots.txt was hacked.

gshannon




msg:4267111
 12:53 am on Feb 15, 2011 (gmt 0)

It is a somewhat different case with respect to the robots.txt ignorance.

However, appspot is subject to quotas and rate limits, in this case my best guess would be that the robots.txt request failed and the spider went on ahead anyway.

Try "inurl:forums.aws.amazon.com inurl:appspot.com" you can see some other attempts here.

The other attempt ('http://itechgiz-proxy-server.appspot.com/forums.aws.amazon.com/index.jspa') fails with a 503 over quota limit so this could be why we do not see this ranking.

I have become somewhat of an expert on debugging this as I have had three major sites that I worked on for previous clients become subject to this attack, hence why I developed the above code.

sdani




msg:4267115
 12:58 am on Feb 15, 2011 (gmt 0)

I understand your point tedster about s3 forum's robots.txt, but should Google be even indexing this appspot URL? If you search for site: example.appspot.com (WHERE EXAMPLE is the specific URL that is showing up here), you will find many sites being cloned AND INDEXED by Google.
It even indexes twitter pages, which includes the twitter page of "the superbowl proposal SEO guru".

Try this URL example.appspot.com/twitter.com/TWITTERHANDLE .. replace example and TWITTERHANDLE.

Google must be able to identify cloned content from twitter, when even the URL is maintained as-is.

leadegroot




msg:4267240
 9:23 am on Feb 15, 2011 (gmt 0)

If they claim to be googlebot but the reverse lookup fails, I just send them a 503 with 1 line content, rather than mucking around with meta robots. I have backed this up with a little bit of IP whitelisting as it has failed a couple of times and I have locked out the gbot for a while! Ahh! :(

indyank




msg:4267279
 11:45 am on Feb 15, 2011 (gmt 0)

Hey people, there seems to be a major proxy scam on the web now and i have been seeing tweets complaining proxy hijacks using the google appspot!

This has even been reported to Matt cutts.

indyank




msg:4267280
 11:49 am on Feb 15, 2011 (gmt 0)

people are also reportig iframed pages competing with the original.I haven't analyzed this in detail but it does look like google is having issues with 302 again!

sdani




msg:4267284
 11:59 am on Feb 15, 2011 (gmt 0)


Hey people, there seems to be a major proxy scam on the web now and i have been seeing tweets complaining proxy hijacks using the google appspot!

This has even been reported to Matt cutts.


This appspot site even maintains the same google analytics code. So, all the days when I was thinking that I am getting traffic, based on Google Analytics, that traffic may be going to the proxy.

indyank




msg:4267285
 12:03 pm on Feb 15, 2011 (gmt 0)

Ok, look at this hijack - [suzetteklierocks.appspot.com...]

What is more interesting is the social gamble that google, bing and the likes seem to encourage and trust these days - look at the FB likes on that page!

200958 likes! I am sure these are hacked likes :0((

indyank




msg:4267312
 1:19 pm on Feb 15, 2011 (gmt 0)

While taking a look at the site that has been hijacked, they seem to have used "lengthy keyword stuffed titles" (in the title tag) on all their blog posts! Looks to me like a keyword polluter been hacked by another spammer in this case!

indyank




msg:4267404
 4:21 pm on Feb 15, 2011 (gmt 0)

The appspot link is now returning a 503 overquota error, but 200958 facebook likes to a hacked page is astounding! "Social gaming" seem to have already started on a massive scale!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved