Forum Moderators: phranque
I'm trying to ban sites by domain name, since there are recently lots of reference spammers.
I have, for example, the rule:
RewriteCond %{HTTP_REFERER} ^http://(www\.)?.*stuff.*\.com/.*$ [NC]
RewriteRule ^.*$ - [F,L]
which should ban any sites containing the word "stuff"
www.stuff.com
www.whatkindofstuff.com
www.some-other-stuff.com
and so on.
However, it is not working, so I am sure I did not setup a proper pattern match rule. Anyone care to advise?
[edited by: jatar_k at 5:06 am (utc) on May 20, 2003]
Wizcrafts,
> Why is the L (Last command) redundant here? Please clarify for us, as we have seen it used so many times.
Take a look at the following documents, the descriptions of the [F] and [G] flags, and the examples provided in Rewriting guide:
Apache Module mod_rewrite - URL Rewriting Engine [httpd.apache.org]
Apache URL Rewriting Guide [httpd.apache.org]
Look for the word "immediately" in the descriptions of [F] and [G], and compare to its use in the description of [L].
I too have seen many more "incorrect" and/or inefficient rewrites than I have "perfect" ones. I have also used and posted [F,L]-terminated rules myself, both out of early unfamiiarity with mod_rewrite and later, out of old (bad) habit. Then there's the issue with ".*" at the beginning or end of an unachored pattern. In both cases, these "mistakes" won't stop the rule from working, they just slow it down. I'd rather have a fast web than a slow web, so I point them out. They are not even real mistakes; rather, they're more like "bad style" (no offense intended here, just trying to make a point).
One secret to success with mod_rewrite is simply this: Print out the cited documents, and put them somewhere where you are likely to read them (three guesses where I keep a copy!). Then, whenever you're.... erm, sitting there, pick 'em up and read 'em. Do this until you can find the page you need by feel, or until you have to print out a second copy because the first one falls apart from wear. :)
I also like the concise regular-expressions tutorial cited in DaveAtIFG's bookmark-worthy Introduction to mod_rewrite [webmasterworld.com] post.
HTH,
Jim
I too have seen many more "incorrect" and/or inefficient rewrites than I have "perfect" ones. I have also used and posted [F,L]-terminated rules myself, both out of early unfamiiarity with mod_rewrite and later, out of old (bad) habit. Then there's the issue with ".*" at the beginning or end of an unachored pattern.
If I understand you correctly, then the final rewrite expression should read thusly: ^.*$ - [F]
Is this correct?
No, just
RewriteRule .* - [F]
will do - there is no need to start- or end-anchor a pattern which is completely wild-carded.
What I was talking about above is this:
^somepattern.*$
can just as easily be written
^somepattern
-and-
^.*somepattern$
can be shortened to
somepattern$
There is no need to anchor a pattern if the characters adjacent to that anchor are wild-cards. Fewer unneeded characters means smaller files and faster regex processing.
Ref: A concise Regular Expressions Tutorial [etext.lib.virginia.edu]
HTH,
Jim
Thanks for that explanation. It is close to what I had, just minus the ",L" in the brackets. I misunderstood your reference to unanchored wildcards.
I just received my copy of Mastering Regular Expressions 2nd Edition, and Writing Apache Modules with Perl.
I have a better understanding about the use of Mod Rewrites now. Is this the best sub-forum to make inquiries about other .htaccess commands, or should I post them elsewhere on the boards?
I think most .htaccess discussions take place here in Website Technology Issues or over in the Perl and PHP CGI Scripting forum (e.g., for SE-friendly-URL rewrites), and there's also some action in Search Engine Spider Identification (bad-bot & spider-blocking/redirecting), and Tracking and Logging (general-UA or IP blocking/redirecting). It's application- and poster-focus- dependent.
It seems like basic URL redirection for renamed files - discussions of 301 redirects, for example - are all over the place, including the (usually-inappropriate) Google forum, just depending on who's panicking and why. ;)
Heck, I can't figure out which forum is "just right" for a subject half the time, and I've actually read quite a few of the forum charters! If something is WAAAY out of line the mods will move it, though I try not to make work for them (Thanks, mods!). Starting with a site-search, you can usually figure out which forum contains the most on-topic discussion of a particular subject area or application.
Jim
There are only two bots (at this point) I'm trying to block. I also have a lot of 301 redirects. So, this is what I have that doesn't generate ANY errors and _seems_ to work.
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "Indy Library"
RewriteCond %{HTTP_USER_AGENT} "IUPUI Research Bot"
RewriteRule .* - [F,L]
RewriteRule ^links/partners\.html$ [widgets.com...] [R=301]
...
More rules and stuff that work as they should follow of course. My concern is the first five lines. First, it appears VALID but will it _WORK_? Second, is this the most efficient way of doing this? Last, do I need to add anything?
Thanks in advance for any and all help.
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "Indy Library"
RewriteCond %{HTTP_USER_AGENT} "IUPUI Research Bot"
RewriteRule .* - [F,L]
RewriteRule ^links/partners\.html$ [widgets.com...] [R=301]
Oaf357:
First of all you have two Rewrite conditions that require either OR to match the pattern, so you need to add [or] after Indy Library. Second, spaces in names must be escaped with a backslash, thusly:
Indy\ Library [or]
^IUPUI\ Research\ Bot
Third, drop the L in the rewrite rule; .* - [F]
Fourth, in the last rule you don't need the $ delimiter. It can be retyped as:
Rewrite ^links/partners\.html [widgets.com...] [R=301,L]
It may not require the leading ^ either, but I'm not certain. Try it without the ^ and see if it works using [wannabrowser.com ]. You are really redirecting here, so a "redirect" rule might prove more correct, but I'm not advanced enough to say for sure.
I hope this is helpful
Wiz
I've been reading these threads, the SESpider Ident and the Perl forums for about a month, on and off, whenever I can find the time and it seems the more I learn the more confuseder I get :)
I want to set up a spider trap but thought it better to get .htaccess working right first since the trap will be another new learning phase.
This is small sample of my .htaccess. Most of what I have does the trick, but not for the six below. I searched for "Indy" and "Microsoft URL Control" and have tried every variation I found in the 'close to perfect' threads, but probably not sequentially which may be the reason they don't work. Would these be correct without the preceeding "^"? And, should Indy Library be changed to "Indy Library" inside qoutes and without escaping the space?
RewriteEngine On
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]¦[3-4][0-9]¦5[0-5])$" [OR]
#lots of UA's......
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [OR]
RewriteCond %{HTTP_USER_AGENT} ^webcollage [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR]
RewriteRule!^403.htm$ - [F]
Deny from 128.242.197.101
RedirectPermanent /flename.htm [mydomain.com...]
One more question, is the 'Deny from' the correct syntax to deny an IP? My hosting service said I didn't need anything above or below it.
Thanks