Forum Moderators: Robert Charlton & goodroi
AddType application/x-httpd-php5 .htm .html
RewriteEngine On
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.mysite.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ https://www.mysite.com/$1 [R=301,L]
# Redirect non-canonical to www
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} !^(www\.mysite.com\.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{HTTP_HOST} !^(www\.mysite.com\.com)?$
RewriteRule (.*) https://www.mysite.com/$1 [R=301,L]
#No permitir direcciones como mysite.com/pagina1.htm/maps/ etc
RewriteRule ^((?:[^./]+/)*[^./]+\.(?:html?|php))/ http://www.mysite.com/$1 [R=301,L]
[edited by: Robert_Charlton at 4:39 pm (utc) on Mar 25, 2013]
[edit reason] removed specifics, per Charter [/edit]
All this is assuming that any given page can be http or https but not both. If a page can be accessed either way, you're pretty well stuck.
if ( $_SERVER['HTTPS'] )
{
$host = $_SERVER['HTTP_HOST'];
$request_uri = $_SERVER['REQUEST_URI'];
$good_url = "http://" . $host . $request_uri;
header( "HTTP/1.1 301 Moved Permanently" );
header( "Location: $good_url" );
exit;
} if ( !$_SERVER['HTTPS'] )
{
$host = $_SERVER['HTTP_HOST'];
$request_uri = $_SERVER['REQUEST_URI'];
$good_url = "https://" . $host . $request_uri;
header( "HTTP/1.1 301 Moved Permanently" );
header( "Location: $good_url" );
exit;
} Remember that robots.txt doesn't prevent indexing; it only prevents crawling. So make sure all links are correct: either https or http, but never both.
Great solution if you don't use a secure server for payment.
The directives listed in the robots.txt file apply only to the host, protocol and port number where the file is hosted.
this could work ok as long as all non-canonical requests get redirected to the canonical url in one hop.
if you do one redirect in your .htaccess file and then a 2nd redirect in your script - this solution is not so good.
# REDIRECT htm INDEX PAGES to index/
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.mysite.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html?$ https://www.mysite.com/$1 [R=301,L]
# Redirect non-canonical to www
RewriteCond %{SERVER_PORT} !^443$
RewriteCond %{HTTP_HOST} !^(www\.mysite.com\.com)?$
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
RewriteCond %{SERVER_PORT} ^443$
RewriteCond %{HTTP_HOST} !^(www\.mysite.com\.com)?$
RewriteRule (.*) https://www.mysite.com/$1 [R=301,L]
#Dont permit paths as mysite.com/pagina1.htm/maps/ etc
RewriteRule ^((?:[^./]+/)*[^./]+\.(?:html?|php))/ http://www.mysite.com/$1 [R=301,L]
[edited by: helenp at 12:05 pm (utc) on Mar 24, 2013]
--use links with leading / and let users stay with https even when it isn't needed. This means the googlebot will eventually have duplicate versions of all your http pages. (Not of https pages, because I assume you don't let http get to those.)
The directives listed in the robots.txt file apply only to the host, protocol and port number where the file is hosted.
That makes it sound as if they expect separate robots.txt files for http and https, even if they belong to the same domain. But what are you supposed to do, rewrite to different robots.txt files depending on the robot's protocol? You might be able to do this on a brand-new site: Here are the rules for http, here are the ones for https. You can't really change at this point, though.
so doesnīt google see the robots.txt file equal for both http and https
Not sure I understood you but I suppose you mean, that I cant redirect [page1.php...] to [page1.php...] in htacessa and then on the page redirect the same page this time to [page1.php...]
No, he means that for example you could have htaccess redirecting
[example.com...]
to
[example.com...]
and then separately you will have the page php redirecting to
http://www.example.com/directory/
But since we are talking about internal links, this is not likely to happen anyway. The form of the page name, including domain, will be correct already. So there would never be more than one redirect.
The question is whether they know or care that it's the same. Their own statement makes it sound as if robots.txt only "counts" for https pages if they get it with https, and it only counts for http if they get it with http. With some robots it would be easy to tell because maybe they will make a fresh request for robots.txt before starting in on the https pages. But the googlebot doesn't work that way.
You can write a different robots.txt rule for a mirrored https... something like this :
RewriteCond %{HTTPS} =on
RewriteRule ^robots\.txt$ robots-ssl.txt [L]
In robots-ssl.txt
User-agent: *
Disallow: /
[webmasterworld.com...]
That makes it sound as if they expect separate robots.txt files for http and https, even if they belong to the same domain. But what are you supposed to do, rewrite to different robots.txt files depending on the robot's protocol? You might be able to do this on a brand-new site: Here are the rules for http, here are the ones for https. You can't really change at this point, though.
but will google search for a robots.txt for ssl when changing from http to https?
when indexing it might recognize it as duplicate content and treat them as one
[edited by: g1smd at 1:24 pm (utc) on Mar 25, 2013]
This code is good, essential even, however there is nothing in there to force any particular page to be http or https.
The usual method is to have www.example.com as http and store.example.com as https, or to define certain folders as https and the rest of the site as http. In your case, it is probably better to do the checking and redirecting from within your PHP script.
[edited by: helenp at 1:30 pm (utc) on Mar 25, 2013]
RewriteCond {not 443}
RewriteRule ^(([^/]+/)*)index\.html?$ http://www.mysite.com/$1 [R=301,L]
RewriteCond {443}
RewriteRule ^(([^/]+/)*)index\.html?$ https://www.mysite.com/$1 [R=301,L]
RewriteCond {wrong domain, not 443}
RewriteRule (.*) http://www.mysite.com/$1 [R=301,L]
RewriteCond {wrong domain, 443}
RewriteRule (.*) https://www.mysite.com/$1 [R=301,L]
{get rid of extra path info}
RewriteRule ^((?:[^./]+/)*[^./]+\.(?:html?|php))/ http://www.mysite.com/$1 [R=301,L]
I think you said in another post that your site parses html as php, so you actually don't have pages in .php except in the https area?