homepage Welcome to WebmasterWorld Guest from 54.196.195.158
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Coding for generating cleanly escaped .htaccess files
incrediBILL




msg:4607381
 10:54 pm on Sep 4, 2013 (gmt 0)

Don't know if any of you write code to generate your .htaccess files, but I do and it saves a ton of time converting massive user agent lists and such using PHP and preg_quote() to automatically escape the strings.

The only gotcha I found so far was it doesn't automatically escape spaces so I added a " " to the list of escaped characters. The special characters escaped are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

If that's not a complete list for Apache, including the space, let me know!

This sample code:
$s="Bat Bot 1.0";
echo "RewriteCond %{HTTP_USER_AGENT} " . preg_quote($s," ") . " [NC,OR]";


Outputs this .htaccess line:
RewriteCond %{HTTP_USER_AGENT} Bad\ Bot\ 1\.0 [NC,OR]


Easy to make a quick routine to process an array, posted form, or file full of user agents and the escaping is flawless so no more 500 errors.

Sample code to process an array of user agents:


$arr = array("bad bot 1.0","googlebot","bingbot");

$ht_output = "RewriteEngine on\n";
$flags="";
foreach($arr as $key=>$ua)
{
$ua=trim($ua);
if (!empty($ua))
{
$ht_output .= "$flags RewriteCond %{HTTP_USER_AGENT} " . preg_quote($ua," ");
$flags=" [NC,OR]\n";
}
}
$ht_output .=" [NC]\n";
echo $ht_output;


The output should be
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} bad\ bot\ 1\.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} googlebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bingbot [NC]

Hope that kicks starts some automation for the more novice coders and generates a lot more clean .htaccess files :)

 

lucy24




msg:4607400
 11:28 pm on Sep 4, 2013 (gmt 0)

If that's not a complete list for Apache

It depends on the module. I'm not sure the colon : needs to be escaped at all; I can only think of one place it's got syntactic meaning, and that's in a rewrite flag. In vanilla Regular Expressions it isn't escaped. Conversely there are a handful of mods that require / escaping. You said .htaccess but did you really mean specifically mod_rewrite?

I think you may be too generous with [NC]. A BadBot is a badbot no matter how it's cased, but there's only one Googlebot. If it calls itself "googlebot" or "GoogleBot" it's fake.

incrediBILL




msg:4607406
 11:45 pm on Sep 4, 2013 (gmt 0)

I think you may be too generous with [NC]. A BadBot is a badbot no matter how it's cased, but there's only one Googlebot. If it calls itself "googlebot" or "GoogleBot" it's fake.


I didn't say it was one size fits all :)

That's true in that any variation of Googlebot other than "Googlebot" is fake but remember I give my known bots a pass up front so the real Googlebot would already be allowed. Any other variation would require the [NC] to catch all fake variations.

Problem with not using [NC] is someone comes along as "bad bot 1.0" on Monday and by Tuesday it's "Bad bot 1.1" and Wednesday it's "Bad Bot 1.2" which is why I would typically just put in "bad bot [NC]" and catch them all if I were doing user agent blocking the old fashioned way.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved