Welcome to WebmasterWorld Guest from 54.205.170.21

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Fake user agent strings

How to recognize...

   
10:21 pm on Jun 7, 2012 (gmt 0)



Hello,

Is it important for a webmaster to be able to recognize a fake user agent? And if it is, how does one do it? User agents come in all shapes and sizes. Some, like the fake Googlebots, are easy to recognize, but what about those really long ones. What do you think of this one?

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; FunWebProducts; GTB7.3; .NET CLR 1.1.4322; FunWebProducts)

Just the duplicate FunWebProducts was odd. But the visitor's behavior was normal. Here is the IP: 79.74.80.nn

Here's a long one: Mozilla/4.0 (compatible; MSIE 8.0; AOL 9.6; AOLBuild 4340.5002; Windows NT 6.1; WOW64; Trident/4.0; GTB7.3; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; BRI/2; MAGW; InfoPath.3; .NET4.0C)

What is this?:
SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

What types of things should we be looking for that would stand out as a potential threat? Should all the components of a ua be in a certain order? Can they be in any order? What difference does it make?

--grandma
8:12 pm on Jun 13, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The lack of a space between "compatible;"


Yup, improper spacing is standard fare used to block a user agent on all my sites.

However, headers are more important in blocking bots than user agents IMO because most bots don't send a few simple things all browsers send so I test headers first, then user agents second, and as a result I boot things off the site executing a lot less code.
2:53 am on Jun 14, 2012 (gmt 0)



I have to run this by you. Have you ever seen anything like this before, note the trailing IP addresses after the ua:

209.131.39.nn - - "GET /example.jpg HTTP/1.1" 200 72343 www.example.com "-"
"Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3" "173.227.72.nn, 66.94.233.nnn"
8:14 pm on Jun 14, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



The 173.227.72.nn IP belongs to a server farm at TW Telecom.

66.94.233.nn is Yahoo (a range new to me).

209.131.39.nn is also Yahoo.

If the context were different I would say the trailing IPs were actually proxy forwarded-for IPs. I see quite a lot of servers trying to by-pass blocks using proxies; usually broadband botnet ones but also G and Y. Given the mobile connotation that is a feasible scenario but I haven't met proxy forwarded-for IPs in logs before.
3:47 am on Jun 15, 2012 (gmt 0)



Would the community find these types of entries helpful, or should we just keep them to ourselves:

5.9.2.nnn - - "GET / HTTP/1.1" 302 - "-" "Ruby"

And in these cases, should we include the whole IP number, or block out that last quadrant?
4:21 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Would the community find these types of entries helpful

The community might go off on a tangent and ask why you're serving up 302s instead of 301s ;)

"Ruby" alone is obviously a bogus UA-- or a very simple-minded robot-- but then you can look up the IP, and maybe you'll discover a hitherto unsuspected server farm in Belarus.

The chances that the offender comes from a block smaller than /24 -- meaning that you need the 4th part-- is too small to be worth bothering about. More likely it will turn out to be splat in the middle of a /13 from some country that you never liked anyway.
5:22 am on Jun 15, 2012 (gmt 0)



Well, now they will be getting a 403.

Only suspicious visitors end up getting that 302 server response. Most typical real visitors come on the site via a Google search and don't hit the index.php file in the root folder. Even the normal search bots like Google or Yahoo or Bing don't access that page. If I go through the logs and look for those GET / HTTP/1.1 or GET / HTTP/1.0 entries, they always come from suspicious IPs and most, if not all, get blocked.

5.9.2.nn belongs to Hetzner Online AG.

Here is another one:
199.168.138.nn - - "GET / HTTP/1.0" 302 - "-" "-"
This is a mail server from VolumeDrive. I'll block them, too.

I could change the coding on the index page to give a 301, but I've just been too lazy to do it. It doesn't happen to Google, Yahoo, or Bing, so for now, since it only happens to the bad guys, I don't think it is an issue.
6:21 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



By returning a 302 or 301 you're telling the bot to make a new request for a different URL. If the bot returns and requests that other URL that action more than doubles the work your server is doing.

Where do you redirect these bots to?
6:54 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




"Ruby" alone is obviously a bogus UA

Actually, it's a valid UA. Ruby is a language based on Perl, and used much in the way as when you see "Java/1.nn" as a GET tool. I see it all the time.
9:36 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



I see it all the time.

From humans?

The name "Ruby" is very familiar to me, because my text editor defaults to Ruby syntax for Regular Expressions, so it's staring me in the face every day ;) But it sure isn't a browser.
11:19 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




No, not from humans. As I said, it's a program to GET files.
9:22 pm on Jun 15, 2012 (gmt 0)



All my files are in a folder in the root directory. My index.php file is in one folder. It is pointing to a different index file in another folder. That is what is causing the 302 code. I think I need to add this piece of code:
header ('HTTP/1.1 301 Moved Permanently');
I just haven't done it yet.

As for the Ruby UA, whenever anyone comes on the site with just one hit, I always check their IP. I've never seen the Ruby UA before. I assume it's a bot from Hetzner. Blocked it. Why wait for trouble?
9:33 pm on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



@ GG

Better yet, remove the index.php and move all the files to root directly, only keeping images, etc in folders.

You can still use the 301 redirect, just edit correctly.

Read here: [webmasterworld.com...]
10:00 pm on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Don't copy the code in that thread as it has a lot of errors as listed in the next post.

There are dozens of errors to fix. Most of them are mentioned in that thread.
9:25 pm on Jun 19, 2012 (gmt 0)



OK, I haven't seen this one before.

209.85.224.nnn - - "GET /example/ HTTP/1.1" 200 26391 "-" "Mozilla/5.0 (compatible; GoogleDocs; legacyeditor; +http://docs.google.com)"

Will be blocking 209.85.224 unless someone has something good to say about them. Don't like proxies. Project Honey Pot said it was acting like a comment spammer.
9:34 pm on Jun 19, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Truncated UA:

157.55.17.nnn - - /cat/subcat/product "-" Mozilla/4.0 (compatible
12:34 am on Jun 20, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



209.85.224.nnn - - "GET /example/ HTTP/1.1" 200 26391 "-" "Mozilla/5.0 (compatible; GoogleDocs; legacyeditor; +http://docs.google.com)"

Will be blocking 209.85.224 unless someone has something good to say about them. Don't like proxies. Project Honey Pot said it was acting like a comment spammer.

###! I've never broken it down beyond a generic "Preview and Translate". Is there a list somewhere that sorts the range into smaller pieces?

:: huge detour here as I discovered I'd got it misflagged as 209.84.0.0/15 when it should only be 209.85.128.0/17, but luckily nothing undesirable ever came from the mislabeled parts ::

I'll be ###. It's all 209.85.224 except a couple of .85.238s for Site Verification and a scattered handful of others. And the 224s in turn are all in the narrower range .224.80-.99 (not .95).

So what do they do with the rest of 85.224, let alone the rest of 85.128-255?
3:38 pm on Jun 25, 2012 (gmt 0)



"However, headers are more important in blocking bots than user agents"

please excuse a newbie question, but how do you use headers to block bots?
3:42 pm on Jun 25, 2012 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Detect which ones are present, or missing, and send 403 response in return. Certain headers are missing or faked from some bots.
9:06 pm on Jun 25, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Inescapable newbie followup: Can you do that in htaccess or is it a PHP Script Thing?

:: fleeing in terror from all those brackets and parentheses ::
5:29 pm on Jun 26, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



please excuse a newbie question, but how do you use headers to block bots?


Browsers send certain things in headers that most bots do not.

For instance, I can send you a 100% exact user agent string that matches Firefox 13 and you'll happily let it access your site.

However, had you bothered to also examine the headers being sent, you'll notice that one or two things that Firefox 13 always sends are not present or are incorrectly presented when the bot faking Firefox 13 sends a request.

Simple to check, simple to block.

Nice cup of 403 forbidden served steaming hot.
This 50 message thread spans 2 pages: 50
 

Featured Threads

Hot Threads This Week

Hot Threads This Month