Doing a search for site:example.com returns our normal result along with this result in the 26th place.
400 Bad Request Bad Request. Your browser sent a request that this server could not understand. Reason: You're speaking plain HTTP to an SSL-enabled server port. ... www.example.com:443/ - 1k - Cached - Similar pages
I guess from the result one can surmise that our site is returning the correct header for this request and therefore this will not lead to any duplicate penalty (am I right?), but the more intriguing part is how did Google bot find this page?
And what does it mean. Do you think that this could be an opening salvo in a bid to hijack rankings? Are there anymore steps that i need to take, to ensure that I am not adversely affected by it?
A little background 1) Site is in a very competitive sector. 2) One of the biggest competitors has suddenly woken up to the fact that SEO exists and has gone full fledged Grey hat (as usual with big companies, they are using an external agency) in creating large amounts of spam sites which all point back to them. Looking at these sites, you can see that they are grey hats in the best of the cases. 3) We have a GWT account and a sitemap. All urls have been crawled. 4) We recently did sitewide 301 redirects for all pages (except homepage) from a lengthy non-optimal url to another lengthy but optimal url. 5) All the new urls have displaced the old ones and are starting to rank for the pages they replaced.
Should i be worried? Should i be boosting up my site's immunity?
The visible error message looks like the port number request was handled properly, but if the http header your server sent was really a 400, it's very unlikely that Google would show the result.
So I'd say there may well be a problem with your server - and yes, someone may be intentionally exploiting it. Whether they are or not, you should ensure that the http server header really returns the correct status and not just the correct visible error message.
Any server header checker tool will do - I prefer Firefox with the LiveHTTPHeader add-on.
I checked with Live Http Headers and it show that my server is returning a 200 All ok response. How do i change this?
That's specific to each server's technology and best discussed in the appropriate forum: Apache [webmasterworld.com] or Microsoft IIS [webmasterworld.com] in most cases.
And is there anyway i can find out how Googlebot came across that page?
Finding out where Google acquires the various urls that they ask for is usually a fruitless task - assuming it's not your own internal link, which you should be able to check easily. Some googlebot urls come from internal or external links, some may come from toolbar type-ins, some may come from Google's ancient history of your site, some may be a form of server testing that googlebot does for various servers... there may be other souorces of url discovery as well.
The key is that it doesn't matter. You've learned that your server's error handling is not set up properly, so you can fix it now.