Forum Moderators: phranque
Here’s my .htaccess:
Options +FollowSymLinks
RewriteEngine On
# If you use the RealUrl extension, then you'll have to enable the next line.
RewriteBase /
RewriteRule ^typo3.*$ - [L]
RewriteRule ^typo3$ typo3/index.php [L]
#script non-www to www redirect
RewriteCond %{HTTP_HOST} !^www\.mysite\.com [NC]
RewriteRule (.*) fileadmin/scripts/do-redirect.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-l
RewriteRule .* index.php [L]
RewriteRule ^typo3 - [L] Take the [NC] flag off the RewriteCond in rule #3.
You forgot the [L] flag on the RewriteRule for rule #3.
The code would be clearer without the blank line between the RewriteConds and the RewriteRule in rule #4.
Jim
Errors for URLs in Sitemap
http://www.example.com/tipstravelkids/ontheroad/travelgames/ Redirect error
http://www.example.com/destinationguides/grandcanyon/plan/ Redirect error
http://www.example.com/vacationimages/photogallery/ireland/ Redirect error
if ($_SERVER["REQUEST_URI"] == '/fileadmin/scripts/do-redirect.php')
exit;
$site_prefix = 'http://www.example.com';
$uri = $_SERVER["REQUEST_URI"];
$urlPart = explode("/", $uri);
/* if URL like example.com//jdk: 404 */
if ($urlPart[1]== '' && $urlPart[2] != '' ) {
pagenotfound();
exit;
} else
/* homepage redirect: non-www to www */
if ($urlPart[1]== '' ) {
$url= $site_prefix . "/";
page301($url);
exit;
} else
/* if file exist redirect to www */
if (is_file("../.." . $uri)){
page301($site_prefix . $uri);
exit;
} else
/* Tips URLs redirect. Correct URL structure: www example.com/tipstravelkids/<category>/<title>/index.html */
/* Check if URL is malformed */
/* if URL is truncated return 404 */
if ( $urlPart[1]== 'tipstravelkids' && $urlPart[3] == '' ) {
pagenotfound();
exit;
} else
/* if URL is longer return 404 */
if ( ($urlPart[1]== 'tipstravelkids') && $urlPart[5] != '' ) {
pagenotfound();
exit;
} else
/* if URL urlPart[4] is different from "index.html" or "" return 404 */
if ( ($urlPart[1]== 'tipstravelkids') && ($urlPart[4] != 'index.html' && $urlPart[4] != '') ) {
pagenotfound();
exit;
} else
/* if wellformed check db to be sure subpath are correct: if so redirect to www-index, otherwise 404 */
if ($urlPart[1]== 'tipstravelkids' && ($urlPart[4] == 'index.html' ¦¦ $urlPart[4] == '')) {
/* if <category> and <article> don't exist, return 404 */
$category = rawurldecode($urlPart[2]);
$title = rawurldecode($urlPart[3]);
$res = mysql_query("SELECT uid FROM tt_news WHERE deleted = 0 AND tx_m2ettnews_urlCategory = '" . addslashes($category) . "' AND tx_m2ettnews_urlTitle = '" . addslashes($title)."'");
if ($res != NULL) {
$results = mysql_fetch_row($res);
if ($results[0]['uid'] == null ) {
pagenotfound();
exit;
}
} else {
pagenotfound();
exit;
}
/* else redirect to www - index */
$url= $site_prefix . "/" . $urlPart[1] . "/" . $urlPart[2] . "/" . $urlPart[3] . "/index.html";
page301($url);
exit;
} else
.... check other kind of URL ...
function page301 ($url) {
@header( "HTTP/1.1 301 Moved Permanently", true, 301);
@header( "Location: " . $url);
exit;
}
[edited by: eelixduppy at 5:38 pm (utc) on Feb. 19, 2009]
[edit reason] please use example.com in code [/edit]
"Recommended" would be more accurate.
Note that the RewriteCond pattern is negated with "!". Include the [NC] only if you wish to allow your site to be indexed under www.example.com, www.Example.com, wWw.ExAmPlE.cOm, and all other possible case variants.
While it is true that domain names are supposed to be handled in a case-insensitive way, allowing case variations with [NC] introduces a dependency; With [NC] your site now depends on search engines to "get it right." If they introduce a bug that affects domain casing, your site suffers, and it will likely take you some time to discover why. And you will find the "cure" is to remove the [NC] from that line.
You should actually use
RewriteCond %{HTTP_HOST} !^www\.example\.co[b]m$[/b] With the end-anchor and without [NC], this RewriteCond means, "Redirect unless the requested hostname is exactly 'www.example.com' -- no FQDN, no port numbers, no case variation."
If your server is directly-accessible by IP address, you should also provide for HTTP/1.0 client access by allowing a blank hostname, in order to avoid a possible "infinite" redirection loop:
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ Subtle code changes -- big effects.
Be very careful interpreting GWT's use of the word "error" -- It is up to you to decide if a 301 "error" is really an error or not. They may be saying it's an error because they found a link to a URL that gets redirected, and they "want" to find links only to the final (new) URL. Of course, you can control that on your own site, but you have no control of old or incorrect links to your site found by Google on other Web sites.
Jim
thanks for the clarification and recommendations. Both are extremely helpful.
With regards to the 'errors' reported in GWT, if it was only a url or two I wouldn't be concerned. Unfortunately 4 different urls from random sections of my site are reported each day. Some have external links but not all do.
Mariella
It appears that your code says, "If URL is bad, then return 404. If URL is good, then 301."
Therefore it appears that any "good" URL requested by Google will get redirected, and Google will call that an error. They want you to link to the final URL, and not to a URL that will return a redirect.
In simple and general terms, the solution is to change your code so that it says, "If URL is bad, then return 404. If URL is good, include and send page content. So, if the URL is good, you want PHP to simply "include" the content associated with the originally-requested URL, instead of redirecting to a different URL.
This is conceptually similar to the difference in mod_rewrite between an external redirect and an internal rewrite. Right now, your PHP code is doing an external redirect, while an internal "rewrite" would be more appropriate.
You will find that your server load will be much-reduced by using the "include" method, and that users will see the "page" that they requested loading twice as fast.This is because the 301 response no longer needs to be sent to the client, and the client does not have to issue a second HTTP request using the URL supplied by your redirect. They will never see any URL in their browser address bar except for the one in the link that they clicked.
You also won't have to deal with the problem of preventing direct access to the content file.
Jim
[edited by: jdMorgan at 11:39 pm (utc) on Feb. 17, 2009]
It appears that your code says, "If URL is bad, then return 404. If URL is good, then 301."
The code posted refers to non-www and thus all 'good' urls will return a 301 rather than a 200. It is my understanding that this is the best practice for cannonical redirects.
Therefore it appears that any "good" URL requested by Google will get redirected, and Google will call that an error. They want you to link to the final URL, and not to a URL that will return a redirect.
None of our internal links and none of the urls in our sitemap return a 301. They are all final URLs (200) not 1 redirects. The only backlinks that return a 301 are 'good' urls that are non-www or non-index. All other backlinks either return a 200 if good or 404 if bad.
This is conceptually similar to the difference in mod_rewrite between an external redirect and an internal rewrite. Right now, your PHP code is doing an external redirect, while an internal "rewrite" would be more appropriate.
If your server is directly-accessible by IP address, you should also provide for HTTP/1.0 client access by allowing a blank hostname, in order to avoid a possible "infinite" redirection loop:RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
Our server isn't directly-accessible by IP address. Should we include the reccommended code in any case?
Mariella
[edited by: Marfola at 10:00 am (utc) on Feb. 18, 2009]
If your server isn't accessible by IP address, and there is no chance that it ever will be, then you don't need to allow for blank HTTP Host headers. But in most cases, it's pretty hard to say "I'll never, ever change hosts." And since the result of not including that simple modification is that your server may temporarily lock up if you ever do move to an IP-accessible hosting account and you do get an HTTP/1.0 request, the modification is cheap insurance against a bug that might be hard to find if you don't remember this thread...
I believe that writing robust and well-documented code saves time and money, and this dictates my coding style.
Jim