homepage Welcome to WebmasterWorld Guest from 54.166.122.65
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
.htaccess BadBot Blocker
JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595125 posted 11:41 pm on Jul 21, 2013 (gmt 0)

I put this together years ago and have it running on quite a few live sites. I've seen other versions with a few UAs, but this takes care of quite a few more of the "known offenders" via .htaccess Forbidden and uses a very efficient set of blocks, so I figured I'd post it.

Feel free to use and update to you liking...

RewriteEngine on
# BLOCK KNOWN BAD BOTS
RewriteCond %{HTTP_USER_AGENT} a((ip)?bot|lexfDownload|mzn_assoc|SPSeek) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} blekko [NC,OR]
RewriteCond %{HTTP_USER_AGENT} c(herry|on(tentSmartz|veras)|rescent|rawl) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} d(um|II|ataCha) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e(asyDL|-?mail|x(abot|tractorPro)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} foobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} g(i(gabaz|joel)|rub) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} h(atena|tt(pdown|rack)|tmlparser) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} i(EAuto|ndy.?Library) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} l(arbin|exiBot|ink(.?walker)?|mcrawler|ocator|wp(-request|::simple)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} m(-crawl|j12bot|i(crosoft(\.URL|-ATL-Native|-CryptoAPI|-WebDAV-MiniRedir|URL\ Control)|ssigua)|o(gren|rpheus)|SProxy|echanize) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} n(etMechanic|ICErsPRO|utch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} o(penfind|ffline|omni[-]?Explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} p(hpcrawl|ingALink|sbot|ycurl|OE-Component-Client-HTTP) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} r(obot|ufus) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(chmozilla|coutjet|earchIt|eek(bot|er)|ogou|proose|imple|l(eipnir|ySearch)|weeper|zukacz) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} t(eleport|ScholarsBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} url(SpiderPro|lib) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} v(oyager|b.?project) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w(eb(Account|Capt|Copier|rank|Whack|Strip|Zip|ster|bandit|\ services\ client\ proto)|get) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^User-Agent [NC]
RewriteCond %{HTTP_USER_AGENT} !(Giga(blast|bot)|Walhello|inktomi|teoma) [NC]
RewriteRule .? - [F]

 

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595125 posted 10:25 pm on Jul 23, 2013 (gmt 0)

EDIT TO THE ABOVE

This line:
RewriteCond %{HTTP_USER_AGENT} o(penfind|ffline|omni[-]?Explorer) [NC,OR]

Should be edited to:
RewriteCond %{HTTP_USER_AGENT} o(penfind|ffline|mni[-]?Explorer) [NC,OR]

Edit Reason: There should not be an o on omni since the line begins with o.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 8:09 pm on Aug 1, 2013 (gmt 0)

Some of the bots in your list are hardly bad bots, like Blekko, that honors robots.txt and hardly belong in a blacklist. At least you would let them know they aren't allowed and they wouldn't keep trying as hard.

If you did a catch-all at the end of your robots.txt that tells all bots not listed that they are denied access, the good bots like MJ12, Blekko, nutch, etc. would all just go away unless they were spoofed UAs.

Of course if they want past your black list just using a random user agent string generator sending something like "Blasjas 2.0" would slip right past it. Most don't even both with that, they use MSIE 9.0's UA and get full access unless you check the headers being sent.

Headers is the real way to spot most of it.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 8:49 pm on Aug 1, 2013 (gmt 0)

These block lists have been posted here dozens of times over the last 17 years. The problem with lists like this are... what's a bad agent for one site may be beneficial for another site, so it's pointless.

It would be a huge mistake to cut'n paste and use a 3rd party block list on your own site without knowledge/experience of each agent's behavior/purpose and how blocking that agent might affect the performance of your site in the big picture.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 9:44 pm on Aug 1, 2013 (gmt 0)

@keyplyr,

While I agree with you in principle, never stop people from sharing or trying to learn as the risks are just as valuable to learn as the rewards from sharing such lists. Now that you've pointed out the risks, which would probably best be shared in this forum in an FAQ that I've been working on, let's examine the rewards.

While I find that block lists are problematic, I think the OP's block list has some interesting techniques that should be explored in detail IMO as most don't fully understand what's going on with the Apache code being used.

For example:
w(eb(Account|Capt|Copier|rank|Whack|Strip|Zip|ster|bandit|\ services\ client\ proto)|get)


This is quite clever and potentially speeds up the scanning vs. big long monolithic lists.

This line only parses when a "w" is found, then the subset of user agents starting with "web" are processed and the various user agents starting with "web" such as webaccount, webcopier, web client, etc. A little harder for the Apache novice to read but a cool way to compact those rules.

Somewhat of a crossover with the Apache forum, but I'd like to see people explore these techniques and learn how to use them.

JD_Toims

WebmasterWorld Senior Member Top Contributors Of The Month



 
Msg#: 4595125 posted 11:57 pm on Aug 1, 2013 (gmt 0)

I agree with blekko not necessarily being "bad" as a bot, but I don't like their practices and the way they ignore / misinterpret some onpage directives, so they've got a good bot, but their practices get them blocked from my sites.

what's a bad agent for one site may be beneficial for another site, so it's pointless.

It's easy to edit and the one I posted was not only built from right here in this forum, it's more robust than most, so rather than people needing to "find and code a list" as a "first line of defense" I posted one they can remove from easily to tailor to their specific needs.

This is quite clever and potentially speeds up the scanning vs. big long monolithic lists.

Thanks!

.htaccess speed is one of my "things" and you're correct, those rules "break" much more quickly than many of the lists I've seen here, because as soon as a letter does not match it moves on, where some of the lists here have 4 or 5 lines (or more) all beginning with W, even though the "W agents" could all be checked on a single line instead.

It not only makes lighter code (the .htaccess is loaded/processed for every URL request, including images, javascript, css, etc) it also allows for faster "breaking and moving on" for non-matches, which is important since the file will be processed for so many requests.

Also, .htaccess processing of mod_rewrite restarts from the top every time a rule is matched, so I try to "find as much speed" as I can to keep the sites I work on fast for real visitors.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 1:26 am on Aug 2, 2013 (gmt 0)

While I find that block lists are problematic, I think the OP's block list has some interesting techniques that should be explored in detail IMO as most don't fully understand what's going on with the Apache code being used.

A good place for this to be *explored* IMO is the Apache Web Server forum or possibly even the Webmaster General forum. If I was looking for htaccess code syntax tips I certainly wouldn't look in the Search Engine Spider and User Agent Identification forum.

yaimapitu



 
Msg#: 4595125 posted 6:06 am on Aug 2, 2013 (gmt 0)

In view of the OP, let me mention the set of rules you can find in my htaccess files:


RewriteCond %{HTTP_USER_AGENT} a(ccess|ds|pp(engine|id)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} c(a(che|pture)|heckp|law|o(llect|pi|py)|url) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} d(ata|evs|ns|o(main|wn)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} e(ngine|ezooms) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} f(etch|i(lter|nd)|tp) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} g(enieo|grab) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} i(mage|ps) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} j(a(karta|va)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} l(arbin|i((b(rary|www))|nk)|oad) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} m(icro|j12bot|mcrawl) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} openany [NC,OR]
RewriteCond %{HTTP_USER_AGENT} p(age_test|erl|hpcrawl|ic|pid|review|ython) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} r(everse|g(ana|et)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(bider|c(an|rape|reen)|iph|noop|trip|u(ck|rvey)|ymantec) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} trend [NC,OR]
RewriteCond %{HTTP_USER_AGENT} video [NC,OR]
RewriteCond %{HTTP_USER_AGENT} w(eb-sniffer|get|in(32|http)|otbox) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} yandexmedia [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zoom [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^-?$ [NC]
RewriteCond %{REQUEST_URI} !410.shtml$ [NC]
RewriteCond %{REQUEST_URI} !robots.txt$ [NC]
RewriteRule .? - [G,L]

RewriteCond %{HTTP_USER_AGENT} bot [NC]
RewriteCond %{REMOTE_HOST} !google [NC]
RewriteCond %{HTTP_USER_AGENT} !(bing|msn|yandex) [NC]
RewriteCond %{REQUEST_URI} !(403|410).shtml$ [NC]
RewriteCond %{REQUEST_URI} !robots.txt$ [NC]
RewriteRule .? - [G,L]

RewriteCond %{HTTP_USER_AGENT} crawl [NC]
RewriteCond %{HTTP_USER_AGENT} !sistrix [NC]
RewriteCond %{REQUEST_URI} !(403|410).shtml$ [NC]
RewriteCond %{REQUEST_URI} !robots.txt$ [NC]
RewriteRule .? - [G,L]

RewriteCond %{HTTP_USER_AGENT} spider [NC]
RewriteCond %{REQUEST_URI} !(403|410).shtml$ [NC]
RewriteCond %{REQUEST_URI} !robots.txt$ [NC]
RewriteRule .? - [G,L]


As regards controlling the visiting entities identifying themselves as bot, crawler or spider, the entries in this list will vary depending on the market(s) a give site serves; shown above is a mimialist set for NA... unwanted visitors get to see an empty 410 file.

The "robots.txt file" is exempted from the block, because all bots should be able to see it.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595125 posted 8:03 am on Aug 2, 2013 (gmt 0)

RewriteCond %{REMOTE_HOST} !google [NC]
RewriteCond %{HTTP_USER_AGENT} !(bing|msn|yandex) [NC]

Any given robot has one correctly cased form of its name. Only that form should be given a pass.

The [G] flag carries an implied [L].

I prefer to constrain my access-control RewriteRules to requests in the form
(\.html|/|^)$

Cases of robots walking in off the street and making "cold" requests for non-page files when they haven't already got the page are so rare that it isn't worth making the server stop and evaluate every single request.

RewriteCond %{REQUEST_URI} !(403|410).shtml$ [NC]
RewriteCond %{REQUEST_URI} !robots.txt$ [NC]

You don't need to say this over and over again. Just start your RewriteRules with an all-encompassing

RewriteRule ^(403|410)\.shtml$ - [L]
and
RewriteCond ^robots\.txt$ - [L]

But in practice I hardly ever use mod_rewrite for access control. Flies-with-an-elephant-rifle sort of thing. Instead it's mod_authz-thingummy alone for IP-based blocks; mod_setenvif leading to "Deny from" for simple UA checks.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 11:31 pm on Aug 2, 2013 (gmt 0)

A good place for this to be *explored* IMO is the Apache Web Server forum or possibly even the Webmaster General forum. If I was looking for htaccess code syntax tips I certainly wouldn't look in the Search Engine Spider and User Agent Identification forum.


keyplr,
You've a short memory!
This forum was using and discussing htaccess (i. e., Apache) syntax long before the Apache forum came into existence.

SSID topics are introduced (and remain without being moved by moderators) into the Apache forum reularly.

Now here you are encouraging topics that are easily searchable in the forum archives under the SSID banner be removed from this forum and discussed in Apache.

Makes perfect sense to me.
BTW, when is the last time you participated in the Apache forum?

Don

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 12:07 am on Aug 3, 2013 (gmt 0)

Never do. I teach all day at work. Not much interest in doing it again when I get home.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4595125 posted 12:22 am on Aug 3, 2013 (gmt 0)

You and Bill make a fine pair ;)

One suggests we limit forum topics to white-listing and the other suggests we move any topic with a hint of Apache (i. e., htaccess) to the Apache Forum.

Once again, and per the other thread, the simplest solution would be to shut this forum down.

yaimapitu



 
Msg#: 4595125 posted 1:43 am on Aug 4, 2013 (gmt 0)

This looks like a good opportunity to review a few things and revise the ".htaccess" files, based on a better understanding. :)

Quoting lucy24:
Any given robot has one correctly cased form of its name. Only that form should be given a pass.

Makes sense... this is on the "to do" list now...

The [G] flag carries an implied [L].

Good to know. Putting an [L] in strategic places is one of the habits I picked up by looking at how others did it (I've learned everything that way, never read the documentation, sorry) - just yesterday I learned that [F] also implies [L]. This is now on the "to do"list for the next update.


I prefer to constrain my access-control RewriteRules to requests in the form
(\.html|/|^)$

Isn't .? the most efficient form?

Cases of robots walking in off the street and making "cold" requests for non-page files when they haven't already got the page are so rare that it isn't worth making the server stop and evaluate every single request.

Hm, I get tons of requests by shady bots for files that are associated with assumed blogs. But any changes i might make depend on the answer to the last question.

Just start your RewriteRules with an all-encompassing

RewriteRule[snip]

Sounds like a good idea - there's much patchwork in these ".htaccess" files that has accumulated over the years - maybe doing a complete rewrite fromscratch is not a bad idea...

But in practice I hardly ever use mod_rewrite for access control. Flies-with-an-elephant-rifle sort of thing. Instead it's mod_authz-thingummy alone for IP-based blocks; mod_setenvif leading to "Deny from" for simple UA checks.

Which access control method is the most efficient (in terms of server resources and time)? Knowing this would help me determine in which order to apply certain rules and which ".htaccess" file to put on which level in the subdirectory structure.

I need to mention here that my sites use a subdirectory structure with several ".htaccess" files, each on a different level. Access control via "Allow/Deny" (domains and ip blocks) happens on one level and "RewriteRule" on another level.

Thanks in advance for additional hints!

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595125 posted 3:06 am on Aug 4, 2013 (gmt 0)

I've learned everything that way, never read the documentation

There is some useful information in the docs. But almost everything at Apache is targeted toward the person who owns the server. So things that happen purely in htaccess tend to be given short shrift.

How many htaccess files you use depends largely on your hosting setup, apart from trivia like maybe an Options +Indexes in some individual directory (since you can't use <Directory> sections in htaccess). My host uses userspace/domain as opposed to the also common primary/addon. So no one domain has to go through another domain's htaccess.

I've got one shared htaccess that's primarily for access control: things that are the same for any domain, like IP blocks. Also most <Files>-based access control, because my filenames tend to be parallel across domains (the obvious example is robots.txt) so I can say something once and then forget about it.

I've always assumed that the single most resource-efficient rule is Allow/Deny using an IP address in CIDR form. Can't remember whether anyone ever found documentation on it. But the IP address is the very first thing in a request, which ought to count for something.

I don't know in an absolute sense whether there's a significant difference in resource use between

BrowserMatch Yukkybot keep_out
BrowserMatch ZeroSum keep_out

Deny from env=keep_out

and

RewriteCond %{HTTP_USER_AGENT} Yukkybot [OR]
RewriteCond %{HTTP_USER_AGENT} ZeroSum
RewriteRule . - [F]

But the first form (BrowserMatch or SetEnvIf) seems intuitively easier to work with. And-- most important when you've got multiple htaccess files-- then you don't have to mess with mod_rewrite's wonky inheritance system.

In practice, [F] and [G] are the most important flags that carry an implied [L]. In particular, [R] (of any kind) does not. For added confusion: the explicit flag [PT] implies [L]-- but any RewriteRule in an htaccess file carries an implied [PT] without [L].


Edit:
Isn't .? the most efficient form?

Here we're talking strictly about access-control rules, meaning that (a) nothing gets captured and (b) if it's happening in mod_rewrite, conditions have to be evaluated. My assumption is: the extra work of looking at the specific content of a requested URI is outweighed by the savings in not having to look at conditions at all if it turns out to be a non-page request.

Access control rules that use mod_authz-whatever-- either by itself or in conjunction with mod_setenvif-- don't involve conditions. And they don't require keeping the results of a rewrite in memory* until you've arrived at the target directory and verified that there will be no further RewriteRules to interfere with the original result. The [L] notation only means "We're done for now". It doesn't mean "There will be no RewriteRules in any deeper directory, or later in the current file".

mod_rewrite is weird.


* Further edit: This behavior isn't unique to mod_rewrite. Notably, anything in a <Files> envelope will override anything that was done elsewhere in the same module. So

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

does the same job in mod_authz-thingie as

RewriteRule ^robots\.txt - [L]

does in mod_rewrite. Or, more to the point, the same job as

RewriteEngine On
RewriteRule . - [L]

in the directory containing the requested file.

But "throw 'em out" is less to remember than "send them to the 13th street entrance and tell them to ask for Joe".

yaimapitu



 
Msg#: 4595125 posted 6:38 am on Aug 4, 2013 (gmt 0)

I've got one shared htaccess that's primarily for access control [...] say something once and then forget about it.

Seems we are basically using the same approach: the first ".htaccess" file that a request has to go through is shared by all domains on a server, the second file will be domain (or domain group) specific, depending on the targeted countries (this is realized via paralell subdirectories), and the last file is domain specific (where it exists) and is only used for hotlinking protection.

I've always assumed that the single most resource-efficient rule is Allow/Deny using an IP address in CIDR form [...] the IP address is the very first thing in a request, which ought to count for something.

Sounds reasonable, butI'd like to know for sure and will keep looking for related information...

Edit:
Isn't .? the most efficient form?

Here we're talking strictly about access-control rules, meaning that (a) nothing gets captured and (b) if it's happening in mod_rewrite, conditions have to be evaluated. My assumption is: the extra work of looking at the specific content of a requested URI is outweighed by the savings in not having to look at conditions at all if it turns out to be a non-page request.

As with your explanation in the previous post, this, too, sounded to me like there was a "backward lookup sequence" involved, and so I searched around and found these sites that say exactly that:
[helicontech.blogspot.jp...]
[littlevale.com...]
What I learned is that a given request is matched against a RewriteRule before the associated RewriteCond conditions are checked .

Now it makes perfect sense to me that you would want to avoid filters of the kind
RewriteCond %{HTTP_HOST} example.com
RewriteCond %{REQUEST_URI} xxyyzz1
RewriteCond %{REQUEST_URI} xxyyzz2
RewriteRule .? - [G]

and,wherever possible, instead use something like
RewriteCond %{HTTP_HOST} example.com
Rewrite Rule ^xxyyzz(1|2)$ - [G]

:)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595125 posted 10:15 am on Aug 4, 2013 (gmt 0)

What I learned is that a given request is matched against a RewriteRule before the associated RewriteCond conditions are checked.

Oh, oops, I think everyone assumed you already knew that, even though you explicitly said you haven't read the docs. Yes, mod_rewrite works on a system of two steps forward, one step back. If a requested URL doesn't fit the pattern of the rule, then the conditions aren't evaluated at all.

Horse's Mouth source:
[httpd.apache.org...]
Replace 2.2 with 2.4 in the URL if necessary. (Yes, you are allowed to use literal periods in URLs. But unless your name is Apache dot org, it is better not to.)

yaimapitu



 
Msg#: 4595125 posted 12:58 am on Aug 5, 2013 (gmt 0)

Thanks for the additjonal info, lucy24

One more question about blocking bots:
Considering the rule set

RewriteCond %{HTTP_USER_AGENT} crawl [NC]
RewriteCond %{HTTP_USER_AGENT} !sistrix [NC]
RewriteRule .? - [G]

would there in this case (using !) also be a way to replace .? with something requiring less work on part of the server or is .? the best we have?

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4595125 posted 1:59 am on Aug 5, 2013 (gmt 0)

If you want it to evaluate the conditions on every request, then no. The pattern can't get much more all-inclusive than
.?
If you only care about page requests, then you can put that into the pattern.

!sistrix [NC]

This is a textbox case of inappropriate [NC] flag. Find out what the sistrix crawler calls itself, and give that exact casing, without flag. Admittedly you will not meet a whole lot of sistrix spoofers compared to, say, fake googlebots. But stay in the habit.

When there's more than one condition, list them in order of most-likely-to-fail unless there's a strong reason for using a different order (for example, if you're listing a whole bunch of robots, keep them in alphabetical order for your own sanity). Unless it's an [OR] delimited group; then default to most-likely-to-succeed.

Here, the intuitive ordering agrees with the likely-to-fail ordering:

intuitive: "UA contains 'crawl' but is not sistrix crawler"
likely to fail: "UA contains 'crawl'" (chance of failure: let's say 90% total) vs. "UA is not 'sistrix'" (chance of failure: let's say 1% total).

yaimapitu



 
Msg#: 4595125 posted 4:48 am on Aug 7, 2013 (gmt 0)

Thanks again...

This is a textbox case of inappropriate [NC] flag.

It's "legacy stuff" and on the "to do" list. ;)

When there's more than one condition, list them in order of most-likely-to-fail

OK, that's what I do - to the extent I know the order (if I dont know enough, I try to come up with reasonable assumptions) and to the extent the list is manageable (some lists I do alphabetically, because any other method makes it unwieldy). But basically it's the same approach as with programming: we want the result of a given test as soon as possible and by using the fewest resources possible...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved