Welcome to WebmasterWorld Guest from 54.146.201.80

Forum Moderators: goodroi

Message Too Old, No Replies

http:// and https:// - Robots.txt File

Need help

     
8:32 am on Nov 21, 2009 (gmt 0)

Full Member

5+ Year Member

joined:Sept 11, 2007
posts: 303
votes: 0


Dear Readers and Webmasters

I was suffering through a problem which occured while search engine ranking. My ranking are dropped from first page to infinity. The ranking was stable with the domain http://www.example.com and suddenly after a fresh crawl it went down and site comes up with https:// in google .. I don't know why google indexed https:// the whole ranking went down and down... :(

My question is that what to do with that to direct google to crawl the old domain again and restrict the google to stop visiting and indexing the domain with https://

One option is with robots.txt file.. I'm not keen to use robots.txt file, and if this will deal with robots.txt file then what useragent will be used.

Another option is to wait to for a sunny day ;)

Will be looking forward to your replies on it... guys I'm very disturbed.

Thanks,

Bilal

[edited by: engine at 4:20 pm (utc) on Nov. 23, 2009]
[edit reason] examplified [/edit]

1:32 pm on Nov 21, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Usually, http and https are set up as two different 'servers' or 'accounts' in your web hosting. If this is the case, then put a robots.txt file into your https/SSL server with
# Disallow all robots from fetching all resources
User-Agent: *
Disallow: /

Another option is to detect unwanted requests for https and redirect them using a 301-Moved Permanently redirect to the canonical URL using the http protocol instead.

How you'd do that depends on your server type and version, your coding skills, and your preferences.

Why did Google index those URLs? Because you allowed it to do so, and someone, somewhere linked to the https 'version' of your site -- either accidentally or maliciously.

Jim

9:40 pm on Nov 21, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3091
votes: 2


I force all my mixed SSL/non-SSL sites to non-SSL with the Canonical tag. Seems to work.

I also detect the user-agent and if it's not obviously a browser I kill SSL links. As far as I'm concerned my sites' SSL pages should not be visible until the ordering process begins, so they are of no concern to SEs.

4:14 pm on Nov 23, 2009 (gmt 0)

Full Member

5+ Year Member

joined:Sept 11, 2007
posts:303
votes: 0


Thanks Guys.. for a great and useful replies.. :) thanks a lot

Bilal

12:43 pm on Nov 30, 2009 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Apr 30, 2007
posts:1394
votes: 0


I don't like the idea of forcing redirects from SSL to non-SSL even for spiders as they will index the page with http. Because secure scripts have to run in https and should have specific code to verify this.

In other words if I run a store, I don't want a customer to get in the create account page in http. So one way to get around it is to use the meta-tags for noindex/nofollow and rel=nofollow property to the links that point to various SSL pages I don't want SEs to index. Or some text form instead of links so spiders cannot follow.

6:20 pm on Nov 30, 2009 (gmt 0)

Full Member

5+ Year Member

joined:Sept 11, 2007
posts:303
votes: 0


Thanks Guys.. the problem is fixed by adding two different robots.txt file... I have used one rotbos.txt file with allow option and other was used to tell sppiders to don't index pages with https:// .. this has also resolved with one robots.txt file .. but due to bothering... I just used two different rotbots.txt file to avoid bothering.. :)

Thanks Guys

Bilal

10:01 pm on Nov 30, 2009 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts: 3091
votes: 2


Enigma1: Note my second paragraph.

Basically, on my sites only sensitive pages (eg payment) are fed via SSL. Any other page (eg product, "about") is publicly visible and fed via non-SSL and available to visitors and SE bots.

Exceptions are T&C, AUP and suchlike which I feel are seldom of any business to SEs. Hence I remove links to those pages entirely and to (eg) carts and payment pages if I sniff an SE bot IP - in fact anything that does not seems to be a browser.

Customers are switched between SSL/non-SSL according to page sensitivity so should never see (eg) a standard product page via SSL and vice versa.

The canonical is to ensure that if some toolbar (whatever) feeds an SSL page to an SE and it tries to follow it with a bot then the page is correctly reassigned to non-SSL. If the bot tries to read a cart or payment page it gets fed a 405 or similar. Ditto for most contact forms.

You cannot bet on SEs getting it right. Some of them (alright, ALL of them) are sometimes very invasive and one has to resort to "firewall" and other methods of rebuttal.