Here is a reason NOT to block blank U-As - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Here is a reason NOT to block blank U-As

It just might be this website checking for valid code

Wizcrafts

7:26 pm on May 26, 2003 (gmt 0)

10+ Year Member

In my previous inquiries I asked how to block spiders that had a blank User-agent, or just a hyphen for a name. I got the answer and applied the rules to my .htaccess file.

The block worked as designed, so no more bots without any ID could look at my website. Then I read a post from one of the more knowledgable members of this Forum, who suggested that blocking blank U-As might also exclude friendly check bots, sent in stealth mode to verify links, or terms of service.

I took this reasoning to heart and removed the two lines that blocked blank U-As, and low and behold, this arrived yesterday:


216.71.84.187 [Sun May 25 22:19:44 2003] "<undefined>" 
216.71.84.187 [Sun May 25 22:27:25 2003] "(HTML Validator [searchengineworld.com...]

If I hadn't removed the blank block SearchengineWorld could not have validated my code! Their first entry was in stealth mode, with no UA. Only the subsequent visits reported the UA in my logs.

My advise is do NOT block blank U-A's. If a bot with a blank UA visits your website do a DNS Lookup in Sam Spade. If it comes from China, or Russia, or another country that is home to spammers, and if it also just indexes your home and guestbook pages, it is probably unfriendly. Block its IP with a "deny from" rule, instead of because of its blank ID. That is what I now do.

Wiz

wilderness

8:50 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Wiz
To use your "own request" for validation as criteria in determining your deny/allow requests is hardly methodical.
Does it make sense to you allowing anybody across the internte to validate your pages and configuring your htaccess with that goal in mind?

As I've mentioned previoulsy denying blank UA's is not a sound policy. However that doesn't prevent me from using it to suit my purposes and traffic trends.

In the end each webmaster must make their own decison as to what is crucial to their individual sites.

I use Xenu when veirfying broken links on my websites, however I turn off the deny for that software during that session and turn it back on afterwards.

There are exceptions to many things however IMO, I wouldn't base my daily methods on those exceptions. Should a webmaster do that? More ofte than not, we will end up on the wrong end of the proverbial "shaft."

bird

10:14 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I'd rather say that other site should fix their validator, instead of webmasters everywhere having to allow arbitrary stealth spiders.

jdMorgan

10:34 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Maybe shomeone should ask Brett to change it then. This is the sister site of WebmasterWorld we're talking about here, right? - The validator at [searchengineworld.com...]

I suppose you could just use the "Report Problem" link at the bottom of this page to report it as an "enhancement request."

Wizcrafts,
The bottom line is that you or someone else used the HTML validator and one of the other tools in the Search Engine World toolbox to check your site. One provided a User-Agent string and the other did not. So, as Don says, you could always block it until you needed to enable it, or make an exception for that IP address.

HTH,
Jim

Wizcrafts

11:12 pm on May 26, 2003 (gmt 0)

10+ Year Member

I did request searchengineworld to validate my code. I was surprised that they use a blank UA, but I had already made the decision to comment out that rule, unless it is needed again. I would rather err on the conservative side here, and maybe allow in a bot that is fishing for addresses that they will no longer find any(in readable format) to use to spam me, than to exclude a good guy wearing a mask (Lone Ranger). I have been converting all my contact info into scripts that even I can't decipher without using the utility that created them. All of my forms have aliased recipients (NMS FormMail replacement), not actual accounts, in the hidden recipient input. I have poison traps everywhere and other countermeasures in place, and the list keeps increasing as new threats are discovered and old solutions become useless.

I see that I have opened a can of worms with my report. It sure would be a boring world if everybody had the same opinion on security issues. ;-)

wilderness

11:59 pm on May 26, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

It sure would be a boring world if everybody had the same opinion on security issues. ;-)

Wiz
That's part of what make this forum and the others at Webmaster World successful.
For the most part, we have folks discussing key issues and sharing similar ideas and goals with diverese objectives, CALMLY. And yet, there is rarely a participant which has the need to be so overbearing in determining that "their way is the only way."

Who would have thought a few years back that the goals of individual web sites could be so directed to cater to specific clientele?
Or even that very regionalized websites have the capability for a global share?

Wizcrafts

4:05 am on May 27, 2003 (gmt 0)

10+ Year Member

I just submitted the bug report, using the link at the bottom of this page, with a good explanation (I hope).

It may not be a bug, but rather a feature of the validator that I logged. I will post the reply, if and when I receive it.

Wiz

jdMorgan

4:16 am on May 27, 2003 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

> I will post the reply, if and when I receive it.

Don't post anything without the permission of the other party, though...

It's not a bug, but I believe that it is "good form" for all user-agents to identify themselves and provide a link to a page that indicates their purpose.

Getting back to your "can of worms" post, I'm far more tolerant of user-agents that identify themselves than I am of those that don't - or worse yet, that try to disguise themselves. My rule-of-thumb is that if the purpose of a user-agent helps my visitors, my sites, or people interested in the subjects of my sites, I'll let it in. Otherwise, I suppose it depends on my mood and recent bandwidth overages. :)

Jim

Wizcrafts

6:21 pm on May 27, 2003 (gmt 0)

10+ Year Member

After thinking about what has been said here I have come to a temporary decision that a blank UA is indeed bad etiquet on the spider's part, and I shouldn't condone it without good cause. Since I am able to read my raw web access logs I can still see any 403 messages and what IP they came from, then trace it. If CJ decides to cloak a spider when they run a link quality check on my website (they haven't cloaked yet) I will copy the IP address and allow it to bypass the blank-blocker rule the next time it comes a-callin'.

From what I have seen in my logs there are more unidentifiable, or RIPE Network IP blank agents then otherwise. Therefore, I have uncommented the blockers to make them active again. I will keep a watch on my logs to see if I am blocking any tracable good guys, and try to allow them to bypass the gate.

As it stands right now all spam is being sent to previously harvested addresses on my website, most of which were in plain-text mailto links, before I knew better. Some I redirect to a catchall account to view and report to SpamCop, while others are sent directly to a dead-drop accont the I never see at all. I have implemented a policy whereby all email links are Javascript includes, with a noscript tag telling those non-scripted viewers to use my contact form instead of email. My form has all of the addresses in a Perl script, not world-readable (711), and the form recipient lines use numeric aliases only. This is working fine for now, but I am ready to change again if the bots become smart enough to decode the scripted links, or infiltrate my cgi-bin files.