Forum Moderators: open

Message Too Old, No Replies

bot; http://

from 216.158.1.nnn

         

Hobbs

10:42 am on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Asked for Robots: Y

Came as: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0/1.0 (bot; [)...]

Came from: 216.158.1.nnn
Consult Dynamics, Inc

wilderness

2:20 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many thanks Hobbs.

There must be some old threads on this?

Although I don't care for these types of 3rd party services accessing my pages, one at least needs to consider their services in close proximity that many of the K-12 networks (choke-choke, gag-gag).

2004, 2007, 2007, 2009

#On topic request
216.158.61.zz - - [22/Apr/2004:10:47:07 -0700] "GET /MyFolder/MyPage.html HTTP/1.1" 200 29784 [google.com...]
lr=&ie=ISO-8859-1&oe=ISO-8859-1" "Mozilla/4.0 (compatible; MSIE 5.16; Mac_PowerPC)"

#Duplicated requests from Consult and visitor IP
58.227.159.zzz - - [18/Jan/2007:10:07:33 -0800] "GET /MyFolder/MyPage.html
HTTP/1.0" 403 - "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
207.245.84.zz - - [18/Jan/2007:10:07:48 -0800] "GET /SameFolder/SamePage.html
HTTP/1.1" 200 39006 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

#On topic request, utilized dual IP's
199.95.171.z - - [26/Oct/2007:09:52:51 -0500] "GET /MyImage.gif HTTP/1.1" 200 1925 "RequestedPage.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322)"
216.158.5.z - - [26/Oct/2007:09:52:51 -0500] "GET /RequestedPage.html HTTP/1.1" 200 49699 "http://www.google.com/search?q=On+topic+" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 1.1.4322)"

#Requested multiple pages

216.158.1.zzz - - [05/Feb/2009:22:15:58 -0600] "GET /MyFolder/MyPage.html HTTP/1.1" 200 11217 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"

GaryK

3:43 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



many of the K-12 networks

We need an LOL smiley here!

I've got a list of 53 user agents with bot; http:// in the string. All of them banned. If you want to see any/all of them let me know.

wilderness

4:22 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary,
The four instances that I provided were all Consult Dynamics, none of which previously used the "bot" UA.

Perhaps I should expand on the "K-12" mention?
There are various 3rd party services which provide networks to K-12 locales.

It was my intention to inject that these many new types of limited networks that we are seeing are a near parallel to the "K-12" networks.

Although I don't personally like the idea of throwing an "umbrella" over the terms "educational" or "3rd party", there are some legitimate instances of these services that provide benefit to both the user and the webmaster.
I simply feel that each instance must be reviewed.

I looked at one similar network service today that proclaimed their services as global. Thus the possibility exists that the services could (at least in effect) act as proxy when sending its customers to websites (by hiding their actual identity).

Don

GaryK

4:35 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh, sorry about that, Don. I thought you were trying to be humorous by implying something about the behavior of the bot. I just Googled the term and found all those 3rd party services.

I'm quicker to open that umbrella over those kinds of bots cause all my experience with educational bots has been negative. They're usually really badly behaved bots that are part of some student's class project. Proxy or whatever, they've all been tarnished in my mind.

Do you store all the IP Addresses these bots use for future reference? Cause I tried that for awhile and wound up with a database table that was in the tens of GB.

wilderness

5:41 pm on Mar 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm quicker to open that umbrella over those kinds of bots cause all my experience with educational bots has been negative. They're usually really badly behaved bots that are part of some student's class project. Proxy or whatever, they've all been tarnished in my mind.

Gary,
"My widgets" and the pages/articles within my sites provide some historical references (somewhere there's an old thread where a General from the early revolution was an annual topic of research for specific school; my page provided a reference to a "widget" of the same name, which was not related to their quest, however the hits continued. Eventually I purposely mis-spelled the page contents name to avoid the hits.)
Much of my content actually provides excellent source leads for these types of learning, however I'm still required to make an individual determination on the activity of each network and whether, the visitor is crawling, caching, or simply utilizing the references I've available.

Do you store all the IP Addresses these bots use for future reference? Cause I tried that for awhile and wound up with a database table that was in the tens of GB.

Yes and no.
Primarily for North America, in the other registrars (RIPE and APNIC, I merely make notations of the ranges outside of the NA Ranges, NO INFO on the BOT's themselves because they don't get in my sites.

For some time, I would make the notations and additions to myself in emails and then periodically export the emails to a local folder.
Simultaneously, I built a directory structure on category and name which bot text files of logs and the references were contained.

For a few years now, I've been using the Copernic Desktop Tool for my widget data.
It also works on my IP, Registrar, crawler data as well.
The tools builds a database index, however not in format useable by an actual database software.

My created folders (including the aforementioned exported emails) are just under 150meg.

Perhaps my most frustrating and recent addition to the IP probes are the "tracerts" for IP's that do not have subnets defined (it should be a crime).

Don

thetrasher

2:51 pm on Mar 22, 2009 (gmt 0)

10+ Year Member



In 2008-07 this bot came as
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0/1.0 (bot; http://; bot@bot.com)

216.158.1.192/28 -> [webmasterworld.com...]

GaryK

3:58 pm on Mar 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don, I understand your rationale better now. Thanks for explaining it in terms I can understand. Sometimes I feel so stupid in the forum. Thanks for the link you sent too.

I think part of the problem with the IP Address data I was storing was that I stored it for all user agents, even known browsers. I need to work on that and try again.

wilderness

4:32 pm on Mar 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Gary,
Seem to recall, perhaps 8-9 years ago, I began a file for ALL UA Strings and and abandoned the task very promptly.

There are many people with active pages working in this regard (although they may be inaccurate), why replicate something so many others are doing, was/is my logic.

My method of simply documenting offenders (for lack of a better term) is a much smaller data storage.

Don

GaryK

9:21 pm on Mar 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I still store all UA strings. I have 118,050 of them to date. Still, it's a relatively small database table. I'm gonna try coding for IP Address storage again this week. Since I check browscap.ini as I'm doing the analysis it should be easy enough to weed out known browsers.