homepage Welcome to WebmasterWorld Guest from 54.221.175.46
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Is this really a PlayStation 3
How can you tell if the browser is really a PlayStation 3?
Ocean10000




msg:3619659
 2:03 am on Apr 5, 2008 (gmt 0)

I am in charge of monitoring a few websites. One of them gets a small number of mobile and alternate browser visits. So I am often trying to figure out if the browser is actually a human using some unknown browser or a new unknown bot trying to sneak on by.

One of these visitors is uses their PlayStation 3 to browse with. So my job was to figure out what criteria that would use to validate against to rule out a bot spoofing the PlayStation 3 to get in. The following is what I came up with.

The User-Agent in question "Mozilla/5.0 (PLAYSTATION 3; 1.00)"

  1. The first checks done is to make sure the User-Agent matches "Mozilla/5.0 (PLAYSTATION 3; 1.00)" to activate the validation checks for this specific browser.

  2. Once it is determined that the UA Matches a known custom test start processing the custom test items, which are the following for the PlayStation 3.
    • Must not include "Accept" Header. If "Accept" header is found then the browser is not a valid PlayStation 3.

      This is unusual most major browsers and bots will always supply this header. It is one of the standard tests used by me to determine that a browser is a spoofer, is by checking if the "Accept" Header is not present. Which usually means it is a bot trying to hide using a well known User-Agent .

    • Must include "x-ps3-browser" Header. If "x-ps3-browser" Header is not present then the browser is not a valid PlayStation 3.

    • The "x-ps3-browser" Header must be in the following format "#.## (WP; system=#.##)" where the # signs are numeric digits. So if it does not match this mask then it is not a valid PlayStation 3.

      Here are a few examples taken from my library to date.
      "1.30 (WP; system=1.32)"
      "1.70 (WP; system=1.70)"
      "1.80 (WP; system=1.81)"
      "1.90 (WP; system=1.90)"
      "2.10 (WP; system=2.10)"

    • Must include "Accept-Encoding" Header. If "Accept-Encoding" Header is not present then the browser is not a valid PlayStation 3.
      The Accept-Encoding" Header must be equal to "identity" and yes case does matter, it is always lower case. So if it doesn't equal "identity" then it is not a valid PlayStation 3.

    • Must include "Accept-Language" Header. If "Accept-Language" Header is not present then the browser is not a valid PlayStation 3.

    • Must include "Connection" Header. If "Connection" Header is not present then the browser is not a valid PlayStation 3.

  3. If it has made it this far there is no further test that can be used exclude it from being a "PlayStation 3" based on supplied headers alone.

 

thetrasher




msg:3619774
 10:54 am on Apr 5, 2008 (gmt 0)

Thank you for another "how to spoof". But where is a comprehensive guide for bot programmers?

lammert




msg:3619779
 11:19 am on Apr 5, 2008 (gmt 0)

<irony on>
Thank you for posting, I have forwarded the information to my bot programmer.
</irony off>

You have missed the newest BIOS versions in the x-ps3-browser string. The newest is 2.20 with the string "2.20 (WP; system=2.20)" and I remember that there has been a 2.16 or 2.17 for a short time a few weeks ago. The PlayStation 3 updates its BIOS rapidly because in some games features stop to work if the current BIOS is older than the on-line available version.

wilderness




msg:3619844
 2:17 pm on Apr 5, 2008 (gmt 0)

Could anybody provide URL's which may explain the procedures and/or requirememnts for "validation of accept headers"?

Thanks in advance.

Don

incrediBILL




msg:3620422
 7:00 pm on Apr 6, 2008 (gmt 0)

Don,

Here's the HTTP/1.1 definitions:
[w3.org...]

You won't find much about what constitutes valid HTTP headers out there, if anything at all, but you can draw some conclusions from that spec about which conflicting directives shouldn't show up in the same HTTP field at the same time.

FWIW, Ocean is basically teaching "BOTS ADVANCED 302" which doesn't show up in any log files or discussion forums anywhere. The only way to get this information is compare the HTTP headers sent by the actual tools and browsers against the spoofs and collect a database full of HTTP header information, which Ocean does, to find out what's valid and invalid on the web.

That makes Ocean basically the HTTP header guru when it comes to analyzing HTTP header fields like HTTP_ACCEPT, HTTP_CONNECTION, and knowing how they are set for legitimate tools vs. the quick and dirty scripts that spoof the UA but don't set these fields properly.

Ocean does more advanced stuff than even I do with headers but the basics are that HTTP_ACCEPT should exist and not be blank (unless you're a PS3 then existing is invalid), and MSIE, FIREFOX and OPERA are invalid if HTTP_ACCEPT is set to something like "text/html, text/plain" or "text/html".

The HTTP_CONNECTION field shouldn't have conflicting directives such as both "close" and "keep-alive" at the same time.

Then we get to PROXY detection which is fun.

If you want to track secondary IPs via a proxy server so you can allow individuals to access your site, via Google's translator for instance, without lumping them all into a single IP which is quickly blocked for abusive looking behavior you have to look at things like HTTP_VIA, HTTP_X_FORWARDED_FOR, HTTP_PROXY_CONNECTION.

I process the IP specified in HTTP_X_FORWARDED_FOR as an the actual IP I'm tracking which allows me to stop a scraper on a proxy while allowing others to continue to use the proxy at the same time I'm blocking some specific activity.

Lot's more stuff in HTTP headers you can use but it's typically beyond the scope of the majority of webmasters to deal with which is why I rarely discuss anything except blocking user agents and IPs because the log files don't show that info and most can't program to address it anyway.

If you want to know why we bother with this stuff, I've been having a rash of hundreds of hits per day on my site from random IPs around the world for single pages, no images, no CSS, no js, with the UA of "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" which appears to be a botnet. The only thing currently standing between my site and this huge distributed network of most likely hacked machines is checking for the bad headers that is currently keeping them out.

Once they read this post...

[edited by: incrediBILL at 7:10 pm (utc) on April 6, 2008]

wilderness




msg:3620435
 7:41 pm on Apr 6, 2008 (gmt 0)

Many thanks Bill and Ocean.

I was interested in method/explantion which would allow me to check headers on my hosted sites.

Perhaps it's not possible?

Don

volatilegx




msg:3620442
 8:06 pm on Apr 6, 2008 (gmt 0)

Don, checking headers is fairly easy using a bit of PHP.

You can use something like the apache_request_headers [php.net] function to get an associative array of the headers that have been sent with the HTTP request for the page. After you have the headers in your array, you can perform whatever tests you like on them.

By the way, this is really a great thread. Thanks, Ocean!

wilderness




msg:3620445
 8:17 pm on Apr 6, 2008 (gmt 0)

Hey Dan,
;) ;)
In order to use PHP-anything?
I'd need to begin using it (DUH! Earth to Don), which I've avoided like hot potatoes.

incrediBILL




msg:3620453
 8:33 pm on Apr 6, 2008 (gmt 0)

I'm no htaccess guru, but you should be able to attempt some header validation like this for example:

RewriteCond %{HTTP_ACCEPT} ^text/html$ [OR]
RewriteCond %{HTTP_ACCEPT} ^text/html, text/plain$
RewriteRule !^403.*\.html$ - [F]

Samizdata




msg:3620464
 9:18 pm on Apr 6, 2008 (gmt 0)

I am probably out of my depth in this discussion but for those with PHP installed you can log anything that doesn't send an HTTP_ACCEPT header by using full error reporting in php.ini - though you will probably want an exclusion for Google-Sitemaps in any action you take.

jdMorgan




msg:3620564
 1:38 am on Apr 7, 2008 (gmt 0)

If you're looking for an Accept header that contains "text/html" then it might be better to leave the pattern unanchored, and be sure that if any characters preceded or follow the MIME_type, that they're the proper delimiters, and not additional characters:

RewriteCond %{HTTP_ACCEPT} [,\ ]?text/html[;,]? [NC]

The semi-colon is included in case there's a "preference weighting" value present, as in "text/html;q=0.8"

I also threw in an [NC] flag in case some user-agents use uppercase characters.

Jim

incrediBILL




msg:3620591
 3:25 am on Apr 7, 2008 (gmt 0)

OK, then the preferred way to do what I suggested is this:

NC = No Case or Ignore Case so any variety of upper/lower case will be caught

RewriteCond %{HTTP_ACCEPT} ^text/html$ [NC,OR]
RewriteCond %{HTTP_ACCEPT} ^text/html,\ text/plain$ [NC]
RewriteRule !^403.*\.html$ - [F]

Jim pointed out that I needed to make the space escaped "\ " as I missed that.

However, you want to leave it anchored like in my original example and this one because a floating "text/html" could incorrectly match many things and generate lots of false positives.

Example of where a false positive would occur:

"MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)"
HTTP_ACCEPT=text/html,text/plain,text/xml,text/*,application/xml,application/xhtml+xml

Thanks again to Jim for the proper syntax updates.

[edited by: incrediBILL at 3:29 am (utc) on April 7, 2008]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved