homepage Welcome to WebmasterWorld Guest from 54.167.179.48
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
obot
IBM iss.net IPs
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4376847 posted 12:18 am on Oct 20, 2011 (gmt 0)

Old bot [google.com...] with new name. Or new bot with old name...

This month, from mothership IBM's Internet Security Systems (iss.net) c/o Germany:

206.253.224.18 [projecthoneypot.org...]
Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)

14:12:26 / GET
14:12:28 / HEAD
14:12:29 / HEAD
14:12:30 / GET
14:12:34 / HEAD

robots.txt? NO

Last month and prior, the same five-hit, GET-HEAD, no-robots pattern from sibling IBM Deutschland IPs on two different sites using two variations:

194.153.113.7 [projecthoneypot.org...]
Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)

194.153.113.8 [projecthoneypot.org...]
Mozilla/5.0 (compatible; oBot/2.3.1; +http://www-935.ibm.com/services/us/index.wss/detail/iss/a1029077?cntxt=a1027244)

12:50:16 / GET
12:50:20 / HEAD
12:50:22 / HEAD
12:50:23 / GET
12:50:27 / HEAD

robots.txt? NO

 

Mokita

5+ Year Member



 
Msg#: 4376847 posted 4:59 am on Oct 20, 2011 (gmt 0)

It's been in all my sites within the last 24 hours. Very busy bot!

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4376847 posted 5:12 am on Oct 20, 2011 (gmt 0)

194.153.113.7 "Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)"

robots.txt: no

Got 403s

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4376847 posted 2:46 am on Jan 16, 2012 (gmt 0)

Whew. I was all set to start a new "oBot where art thou?" thread, but looks like I squeaked under by a few days. Do they give you three months to wake slumbering threads?

Did any of you try the UA links? The long version

www-935.ibm.com/services/us/index.wss/detail/iss/a1029077?cntxt=a1027244

leads to IBM's Search page, with parameters filled in:

www-01.ibm.com/common/ssi/apilite?infotype=PM&infosubt=AB&doctype=XSO_* or XMS_* or XMT_* or XSE_* or XIC_* or XIA_* or XSS_* or XSF_* or XSD_* or XEN_* or XBU_*&lastdays=1825&ctvwcode=US&appname=GTSE_GT_GT_USEN_CS&additional=summary&contents=C0_CST and keeponlit

Goodness me. The only thing missing from the resultant search results is any information about oBot. A polite inquiry to IBM asking whether this is their robot has so far met with, hm, polite silence.

The short version ...

filterdb.iss.net/crawler/

... is utterly fascinating, because I tried it just hours ago and got the Microsoft server's "ain't no such page" screen. It must be in a better mood now, because there's a bona fide "What is oBot?" page. Says they:

- user-agent: Mozilla/5.0 (compatible; oBot/2.3.1; +http://filterdb.iss.net/crawler/)
- our IP ranges 206.253.224.x or 194.153.113.x

Hm, don't see the Long Version in there anywhere, do you? But they came visiting from the very same IP. And thanks, IBM, for the heads-up about 194.153. Wouldn't have known that.

I have never met this robot before in my life. But they must know me, because their first visit consisted entirely of requests for HEADs of the image files that go with my front page. Not the current files that go with the current front page; datestamps tell me their previous visit can't have been later than December 2010 (not a typo). Some of them happen to still exist, though no longer linked to the front page. For the rest, they came back a few hours later and went away still unsatisfied.

A week later they were back with a fresh shopping list. This was an educational visit for me. First I learned that I must have a Unix server, because they asked for a couple of lower-case files whose real names are Title Case, so they got nothing but 404s. No more futzing about with HEAD; this time they asked for the whole thing.

After that they swung by my front page-- the current one-- and got up to speed on the images. (Which, incidentally, they always requested with the correct casing.)

A minute later* they came dashing back from the parking lot, apparently having overlooked the last two items on the list. First stop: an utterly random painting that I've never even bothered to make a page for. It was moved from its original location months ago, so the 404 would seem reasonable... except that the said original location was also roboted-out** months ago. A fact they must surely have noticed, since by this time they'd read robots.txt three separate times.

Second stop: an html file that they would have gotten handily if only they'd put it in Title Case. It happens to be the parent file of the two 404s they got earlier-- meaning that they must have picked up it, too, on their previous visit.


* Exactly a minute, as a matter of fact. Well, maybe it's coincidence.
** Someone hereabouts brilliantly suggested that as an alternative to redirecting forever, I could simply robot-out the nonexistent directories.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4376847 posted 4:55 am on Jan 16, 2012 (gmt 0)

Well, dang and blast. Here I'd already gone and blocked them on robots.txt grounds, before ever consulting the horse's mouth [cobion.com] via their own links.

There we learn among other things:
IBM Proventia Web Filter database categories
<snip, snip>
Religion: Includes Web sites with religious content, information about the five main religions, and religious communities that have emerged out of these religions.
Sects: This category contains sites about sects, cults, occultism, Satanism etc.

Tough luck, all you Sikhs, Parsees and Baha'is. Guess you're just cults.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved