Forum Moderators: phranque
This has happened on three different hosts, so it is not a host-specific issue.
For an example, this is my current setup:
/index.php?cPath=21
Setup with mod_rewrite:
/category/Category.html
Stylesheet:
/style.css (yes, it's in the root)
Example of an Image:
/images/pixel_trans.gif
For MOST users, it works fine! I cannot reproduce this error on any test machine or test website I tried it on.
But nevertheless, some users get:
/category/style.css
/category/images/pixel_trans.gif
....for EVERYTHING (every image, every stylesheet, etc. that's in the source).
Anyone know what could be causing this?
Here's the htaccess rules:
RewriteEngine on
RewriteBase /
RewriteRule ^([^/]*)\.html$ $1.php?%{QUERY_STRING} [NC]
RewriteRule ^/?(manufacturers)/([^/]*)\.html$ index.php?manufacturers_id=$2&%{QUERY_STRING} [NC]
RewriteRule ^/?(product)/([^/]*)\.html$ product_info.php?products_id=$2&%{QUERY_STRING} [NC]
RewriteRule ^/?(category)/([^/]*)\.html$ index.php?cPath=$2&%{QUERY_STRING} [NC] The pages this occurs on are all re-written themselves (as such /category/ ), but the actual files are located in the root ( / ).
RewriteEngine on
RewriteBase /
RewriteRule ^([^.]+)\.html$ $1.php [NC,L]
RewriteRule ^manufacturers/([^.]+)\.html$ index.php?manufacturers_id=$1&%{QUERY_STRING} [NC,L]
RewriteRule ^product/([^.]+)\.html$ product_info.php?products_id=$1&%{QUERY_STRING} [NC,L]
RewriteRule ^category/([^.]+)\.html$ index.php?cPath=$1&%{QUERY_STRING} [NC,L]
Jim
Here is a snippet from my error log:
[Sat May 12 15:59:59 2007] [error] [client 24.84.X.X] File does not exist: /home/**/public_html/category/images/pixel_trans.gif url rewrite:
<?phpfunction callback($pagecontent) {
$pagecontent = preg_replace_callback("/(<[Aa][ \r\n\t]{1}[^>]*href[^=]*=[ '\"\n\r\t]*)([^ \"'>\r\n\t#]+)([^>]*>)/",'wrap_href',$pagecontent);
return $pagecontent;
}
function transform_uri($param) {
$uriparts = parse_url($param[2]);
$newquery='';
$scheme = $uriparts['scheme'].'://';
if (($scheme!= 'http://') && ($scheme!= 'https://')) return $param[1].$param[2].$param[3];
$host = $uriparts['host'];
if ($host!= $_SERVER['SERVER_NAME'] && $host!= $_SERVER['SERVER_ADDR']) return $param[1].$param[2].$param[3];
$path = $uriparts['path'];
list($file,$extension) = explode('.', basename($path));
if($extension!= 'php') return $param[1].$param[2].$param[3];
$extension = ".html";
$path = rtrim(dirname($path),'/');
$query = $uriparts['query'];
$anchor = $uriparts['anchor'];
if ($a = explode('&',$query)){
foreach ($a as $b) {
list($key,$val) = split('=',$b);
switch ($key) {
case 'cPath':
if(eregi('[_0-9]', $val)){
if($cat_arr = explode('_', $val)){
$count = false;
foreach($cat_arr as $value){
$cat_Q = tep_db_query("select c.categories_id, cd.categories_name from " . TABLE_CATEGORIES . " c, " . TABLE_CATEGORIES_DESCRIPTION . " cd where c.categories_id = '" . $value . "' and c.categories_id = cd.categories_id");
$cat_name = tep_db_fetch_array($cat_Q);
if(!$count){
$result .= $cat_name['categories_name'];
$count = true;
}
else{
$result .= '_' . $cat_name['categories_name'];
}
}
$cat = '/category/'. str_replace(' ' , '+' , $result);
}
else{
$cat = '/category/'.$val;
}
}
else{
$cat = '/category/'.$val;
}
break;
case 'language':
$lan = $val.'/'.$path;
break;
case 'products_id':
$name_Q = tep_db_query("select products_name from " . TABLE_PRODUCTS_DESCRIPTION . " where products_id = '" . $val . "'");
$pro = ($t = tep_db_fetch_array($name_Q))? '/product/' . str_replace(" ", "_" , $t['products_name']) : '/product/'.$val;
break;
case 'manufacturers_id':
$manufacturers_Q = tep_db_query("select manufacturers_name from " . TABLE_MANUFACTURERS . " where manufacturers_id = '" . $val . "'");
$man = ($t = tep_db_fetch_array($manufacturers_Q))? '/manufacturers/'.str_replace(" ", "_" , $t['manufacturers_name']) : $man = '/manufacturers/'.$val;
break;
case 'catid':
if(strstr($_SERVER["HTTP_USER_AGENT"],'Mozilla')) $newquery .= $key.'='.$val.'&';
break;
default:
if($newquery ¦¦ $key) $newquery .= $key.'='.$val.'&';
}
}
}
if ($newquery) $newquery = '?'.rtrim($newquery,'&');
$path = '';
if(isset($man)) $path .= $man;
if(isset($cat)) $path .= $cat;
if(isset($pro)) $path .= $pro;
((isset($man) ¦¦ isset($cat) ¦¦ isset($pro)))? $host .= '' :$host .= '/';
if($file == 'index' ¦¦ $file == 'product_info'){
if((isset($man) ¦¦ isset($cat) ¦¦ isset($pro))) $file= '';
}
if(eregi('reviews',$file)) $file = '/' . $file;
return $param[1].$scheme.$host.$file.$path.$extension.$newquery.$anchor.$param[3];
}
function wrap_href($param) {
return transform_uri($param);
}
ob_start("callback");
?>
sef:
<?php //products_id
if(isset($HTTP_GET_VARS['products_id']) &&!eregi('^[0-9]*$',$HTTP_GET_VARS['products_id'])){
$name_Q= tep_db_query("select products_id from " . TABLE_PRODUCTS_DESCRIPTION . " where products_name = '" . str_replace("_"," ", $HTTP_GET_VARS['products_id']) . "'");
if(tep_db_num_rows($name_Q)){
$t = tep_db_fetch_array($name_Q);
$HTTP_GET_VARS['products_id'] = $t['products_id'];
}
}
// manufactures_id
if(isset($HTTP_GET_VARS['manufacturers_id']) &&!eregi('^[0-9]*$',$HTTP_GET_VARS['manufacturers_id'])){
$band_Q = tep_db_query("select manufacturers_id from " . TABLE_MANUFACTURERS . " where manufacturers_name = '" . str_replace("_"," ", $HTTP_GET_VARS['manufacturers_id']) . "'");
if(tep_db_num_rows($band_Q)){
$t = tep_db_fetch_array($band_Q);
$HTTP_GET_VARS['manufacturers_id'] = $t['manufacturers_id'];
} else {
require('includes/unknown.php');
exit();
}
}
if(isset($HTTP_GET_VARS['cPath']) &&!eregi('^[_0-9]$', $HTTP_GET_VARS['cPath'])){
$cPath = $HTTP_GET_VARS['cPath'];
if(!eregi('^[_0-9]*$',$cPath)){
$cat_arr = explode('_' , $cPath);
$parent = 0;
$count = false;
foreach($cat_arr as $value){
/*echo($value . '<br>');*/
if(!$count ) {
$cat_Q = tep_db_query("select c.categories_id, c.parent_id from " . TABLE_CATEGORIES . " c left join " . TABLE_CATEGORIES_DESCRIPTION . " cd on (c.categories_id=cd.categories_id) where cd.categories_name = '" . str_replace('+' , ' ' , $value) . "'");
} else {
$cat_Q = tep_db_query("select c.categories_id, c.parent_id from " . TABLE_CATEGORIES . " c left join " . TABLE_CATEGORIES_DESCRIPTION . " cd on (c.categories_id=cd.categories_id) where c.parent_id = '" . (int)$parent . "' and cd.categories_name = '" . str_replace('+' , ' ' , $value) . "'");
}
if( $cat_name = tep_db_fetch_array($cat_Q) ) {
if(!$count) {
$result .= $cat_name['categories_id'];
$count = true;
} else {
$result .= '_' . $cat_name['categories_id'];
}
$parent = $cat_name['categories_id'];
} else {
require('includes/unknown.php');
exit();
/*
tep_redirect(tep_href_link('unknown.php', '', 'NONSSL', false));
echo ('error with category ' . $value . '<br>');
*/
}
}
$HTTP_GET_VARS['cPath'] = $result;
}
}
?>
Either modify the php script to drop the "fake" subdirectory path from image and css references, or add more mod_rewrite code to correct the bad URLs that php is generating.
It may be a simple case of linking to these files using page-relative links. Since these relative links are resolved by the browser, and the browser thinks the page is located at /category/<something.html, any image referenced using a page-relative link like <img src="images/pic.gif"> will be requested from /category/images/pic.gif. A simple solution is to use a server-relative link --in this case, <img src="/images/pic.gif"> -- so the browser builds the image link starting with the root directory instead of the page directory.
Jim
Since these relative links are resolved by the browser, and the browser thinks the page is located at /category/<something.html, any image referenced using a page-relative link like <img src="images/pic.gif"> will be requested from /category/images/pic.gif.
PS The cleanup of my htaccess you did, can you explain the benefits of changing it? I realize it's old code, but was there any security holes or memory leaks? Just want to know for future reference. :)
For a given URL, mod_rewrite will either work every time or it won't work at all, unless your Apache installation is corrupt in an extremely-unlikely way. I assume that you're sure that your script and back-end are working, and/or that you don't use page-relative addressing on pages which have been "rewritten out" of their URL-implied subdirectories.
If that is the case, then my next question would be, "Are you sure that these are actually humans using browsers, and not just several instances of a badly-written scraper 'bot?" I'd be checking their user-agent parameters to be sure they're all valid, chasing their IP addresses to see where they're from, looking at the raw log file to see if they take a 'human' click-path through your site, etc. Has any of these "visitors" ever contacted you to report a problem with your site?
Jim
No, nobody has ever emailed me informing me of this problem - I only know it exists because it shows up in the 404 error logs.
I've seen a few of them show up in my "who's online" page, some of them have valid referral strings and (seemingly) valid user-agents.
But consistently, seemingly valid users are also always trying to look up some MSOFFICE file with several query strings. These I initially assumed to be a zombie computer or hacker bot feeling up the website. I guess the same could be said for the people that are getting these mysterious rewrites.
Also, If you have a moment: I am interested in your cleanup of my htaccess script. Is there any immediate dangers of using my old version, or did you simply modernize it?
For more information, see the regular-expressions tutorial cited in our forum charter [webmasterworld.com].
[added] The MSOffice requests may be a result of the visitor browsing with the "Discussion/collaboration" options turned on in MSIE, or in some cases, people actually using MS Word as a browser. The requested filenames can be used to determine which is the case. [/added]
Jim
[edited by: jdMorgan at 1:13 pm (utc) on May 14, 2007]
The things that make a forum like this one work are that *all* members can contribute to a thread, and that the thread is then available to many members who may come along later with the same or a similar problem.
That's why every one of us here at WebmasterWorld owns "example.com" and sells widgets... although color, size, and texture variations do occur :)
Jim
URL structure as follows (one example):
[widgetwebsite.com...]