Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
I was suffering through a problem which occured while search engine ranking. My ranking are dropped from first page to infinity. The ranking was stable with the domain http://www.example.com and suddenly after a fresh crawl it went down and site comes up with https:// in google .. I don't know why google indexed https:// the whole ranking went down and down... :(
My question is that what to do with that to direct google to crawl the old domain again and restrict the google to stop visiting and indexing the domain with https://
One option is with robots.txt file.. I'm not keen to use robots.txt file, and if this will deal with robots.txt file then what useragent will be used.
Another option is to wait to for a sunny day ;)
Will be looking forward to your replies on it... guys I'm very disturbed.
[edited by: engine at 4:20 pm (utc) on Nov. 23, 2009]
[edit reason] examplified [/edit]
# Disallow all robots from fetching all resources
Another option is to detect unwanted requests for https and redirect them using a 301-Moved Permanently redirect to the canonical URL using the http protocol instead.
How you'd do that depends on your server type and version, your coding skills, and your preferences.
Why did Google index those URLs? Because you allowed it to do so, and someone, somewhere linked to the https 'version' of your site -- either accidentally or maliciously.
I also detect the user-agent and if it's not obviously a browser I kill SSL links. As far as I'm concerned my sites' SSL pages should not be visible until the ordering process begins, so they are of no concern to SEs.
In other words if I run a store, I don't want a customer to get in the create account page in http. So one way to get around it is to use the meta-tags for noindex/nofollow and rel=nofollow property to the links that point to various SSL pages I don't want SEs to index. Or some text form instead of links so spiders cannot follow.
Basically, on my sites only sensitive pages (eg payment) are fed via SSL. Any other page (eg product, "about") is publicly visible and fed via non-SSL and available to visitors and SE bots.
Exceptions are T&C, AUP and suchlike which I feel are seldom of any business to SEs. Hence I remove links to those pages entirely and to (eg) carts and payment pages if I sniff an SE bot IP - in fact anything that does not seems to be a browser.
Customers are switched between SSL/non-SSL according to page sensitivity so should never see (eg) a standard product page via SSL and vice versa.
The canonical is to ensure that if some toolbar (whatever) feeds an SSL page to an SE and it tries to follow it with a bot then the page is correctly reassigned to non-SSL. If the bot tries to read a cart or payment page it gets fed a 405 or similar. Ditto for most contact forms.
You cannot bet on SEs getting it right. Some of them (alright, ALL of them) are sometimes very invasive and one has to resort to "firewall" and other methods of rebuttal.