homepage Welcome to WebmasterWorld Guest from 184.73.52.98
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 40 message thread spans 2 pages: < < 40 ( 1 [2]     
the latest impenetrable disguise
lucy24




msg:4538509
 2:37 am on Jan 23, 2013 (gmt 0)

Some of youse may have seen this before. It's a new one on me.

83.23.202.99 - - [22/Jan/2013:12:30:44 -0800] "GET / HTTP/1.1" 200 2526 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:39"
83.23.202.99 - - [22/Jan/2013:12:30:44 -0800] "GET /wp-login.php?action=register HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:40"
83.23.202.99 - - [22/Jan/2013:12:30:45 -0800] "GET /register.php HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:41"
83.23.202.99 - - [22/Jan/2013:12:30:45 -0800] "GET /admin.php HTTP/1.1" 403 928 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:30:41"
<snip, snip for a total of 15 requests>
83.23.202.99 - - [22/Jan/2013:12:47:57 -0800] "GET /add HTTP/1.1" 404 912 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:47:52"
83.23.202.99 - - [22/Jan/2013:12:47:57 -0800] "GET /otwarty_admin/ HTTP/1.1" 404 912 "-" "Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 2013-01-22 21:47:53"


Nifty huh? Just tack your current server time onto the end of your UA string and nobody will ever be able to block you.

Unless, of course, they've already got an IP block on eastern European robots whose clock is four seconds slow. Or a UA block on anything ending in \d\d:\d\d:\d\d. Or a block on external requests for php files. Or...

83.0.0.0/11 is Poland. Unless someone has evidence to the contrary, I'm going to assume it's all servers. I've previously met 83.9.something with the same owner, though each piece will only admit to /13 of the full range.

 

blend27




msg:4539550
 8:45 pm on Jan 26, 2013 (gmt 0)

@incrediBILL

You promised not too....

lucy24




msg:4539573
 10:37 pm on Jan 26, 2013 (gmt 0)

And speaking of impenetrable disguises... Any way to tell the difference between a text-only browser and a robot?

Here's me visiting the art studio's site with Lynx. Second page visited, in logs and in iB's logheaders function:

{my IP} - - [26/Jan/2013:14:12:39 -0800] "GET /info/background.html HTTP/1.0" 200 1567 "http://example.org/info/events.html" "Lynx/2.8.6rel.5 libwww-FM/2.14"

IP: {my IP}
Host: example.org
Accept: text/html, text/plain, text/css, text/sgml, */*;q=0.01
Accept-Encoding: gzip, compress, bzip2
Accept-Language: en
Accept-Charset: utf-8, iso-8859-1;q=0.01, us-ascii;q=0.01
User-Agent: Lynx/2.8.6rel.5 libwww-FM/2.14
Referer: http://example.org/info/events.html

Hm. If it had said "libwww-perl" I would have locked myself out.

blend27




msg:4539575
 10:52 pm on Jan 26, 2013 (gmt 0)

Why, in the world, some one would use Lynx as a browser in our Day and Age(unless thou is from Minnesota :)...)?

The request with those headers would get nuked on my sites before it gets to UA or IP check.

keyplyr




msg:4539578
 12:02 am on Jan 27, 2013 (gmt 0)





The request with those headers would get nuked on my sites before it gets to UA or IP check.

ditto

lucy24




msg:4539580
 12:23 am on Jan 27, 2013 (gmt 0)

Why, in the world, some one would use Lynx as a browser in our Day and Age

I used Lynx in order to test Lynx :P It's also a useful backup check every now and then. This time around, it told me that my text-as-image links work perfectly well in a text-only browser (meaning that they would also work if the server happened to eat the relevant image file); it simply shows the alt text.

before it gets to UA or IP check

Gosh. What has it already said that a human would never say? Or is it the bare fact that the user-agent comes at the end of the headers instead of immediately before or after the Host line? (I checked three other browsers to make sure the php function wouldn't make the site explode. That's after I worked out how to get all the header logs in one place instead of a separate one in each directory. Ouch.)

Lynx doesn't seem to be very popular with robots though. The only one that crops up in current logs is a Lithuanian at 78.158 that I finally blocked for appearances' sake.

dstiles




msg:4539669
 7:21 pm on Jan 27, 2013 (gmt 0)

I get a few lynx per month. They all get banned.

There are a couple of tools I use to check my sites from time to time. They are included in my security system as "block this unless it's me" where "me" means my fixed IP which only I use.

lucy24




msg:4539701
 1:57 am on Jan 28, 2013 (gmt 0)

Bottom line: there aren't only false negatives (robots that successfully masquerade as humans). There are also false positives (humans that are mistaken for robots). This obviously takes us into ymmv territory. Personally I would rather let in a few unwanted robots than lock out a few honest humans. Unless they're from China or something, duh ;)

I've got the header logs running at the art studio's site. Haven't collected enough information to do anything about it* but do note with interest that the Yandexbot sends an accept-language header. Wonder why?

What does Wsip mean? It showed up at the end of a DomainTools query with an attached IP that was different from the first one. It may even be the site's own IP-- it belongs to my host-- but the site doesn't have a fixed IP so I can't be sure. I got as far as "Web service initiation protocol" and then the explanations lapsed into Hungarian.**


* As noted elsewhere, this site is smaller than mine by a number of orders of magnitude that would seem to be mathematically impossible.
** As in "It's all Hungarian to me".

keyplyr




msg:4539740
 5:32 am on Jan 28, 2013 (gmt 0)

WSIP stands for Web Services Integration Platform (not protocol) see Oracle's Primavera white paper.

I disagree that header checking categorically creates "false positives." It just takes time to fine tune your defenses, just as it does any other white list.

lucy24




msg:4539802
 10:14 am on Jan 28, 2013 (gmt 0)

The request with those headers would get nuked on my sites before it gets to UA or IP check.

lucy24




msg:4540414
 3:47 am on Jan 30, 2013 (gmt 0)

Oh, and don't think I didn't see you sneaking around in that ten-foot-tall, bright-neon-orange mask, you anonymouse you ;)

Matter of fact, I first saw them in the other site's logged headers. So somebody was looking at pictures-- sans javascript, which kinda takes the fun out of it-- while I was doing battle with FontForge. (Hint: I couldn't make head or tail of MacPorts, so had to follow the raw installation instructions. Due to some weird oversight on the developer's part, this involved only three steps, worked on the first try, and continues to work after I moved the installed files.)

This 40 message thread spans 2 pages: < < 40 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved