Forum Moderators: open

Message Too Old, No Replies

SiteBot

Should SiteBot be blocked?

         

grandma genie

3:53 pm on Aug 28, 2010 (gmt 0)

10+ Year Member



Hello -

Another new bot is indexing my site. It is called SiteBot. Here is one of the many lines of code from my server logs:

212.113.xx.nn - - [28/Aug/2010:05:09:31 -0400] "GET / HTTP/1.1" 200 30601 "-" "Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)"

Anyone have any comments about this little arachnid? I have blocked it.

Grandma_genie

dstiles

9:14 pm on Aug 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If it's the same as the one I see it's from a server block in Ukraine: kill for no other reason than that?

One thing against it: the bot URL shows the home page. Contrariwise, it's in English. But then, the site is hosted in USA.

Can't find anything adverse about it in StartPage or Google - in fact very little at all (this posting is top!). Domain is obscured by the domain protection system, which is also against it but not excessively so (that service goes too far!).

Worst point: it was registered 12 August 2010 so I'd say too new to have the content it appears to have. Until I discovered that I almost let it through.

grandma genie

9:41 pm on Aug 28, 2010 (gmt 0)

10+ Year Member



Ewwww. A real sneaky little guy. Well, it's blocked until I get some customers from Ukraine... when h*ll freezes over ;)

jmccormac

11:38 pm on Oct 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just showed up on some sites here over the past few hours. Fast and mangled links. Also on Ukraine ISP/data centre ranges.

Regards...jmcc

tangor

11:55 pm on Oct 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



grandma... make life a bit easier: Decide which bots you let in via robots.txt and nuke the rest which don't honor robots.txt. Life gets so much easier when that decision is made!

The savings in bandwidth is ka-ching in the pocket.

I don't care about how many bots or humans access my robots.txt... or how often, as long as the directives are honored (ie. no intrusion into the site). Those that do NOT (and that number is FAR SMALLER than those who do) offers enough info in their perfidy to kill by either UA or IP.

Don't mind serving 403s... once again the bandwidth saving is immense. Particularly if you offer the stock 403 instead of a custom. Just make sure .htaccess allows ALL to get robots.txt AND the custom403 (if you have one)... otherwise the stock 403 will be served.

Users/Visitors are customers off the street during business hours (in web speak that means 24/7/365) but bots are either "friends" or "vacuum salespersons" knocking on your door. Some you invite in, some you don't. Avoid hard work, just as you avoid unwanted visitors to your house.

BUT DO REPORT IN THIS FORUM any bot that fails the sniff test and is otherwise rude and unruly.

Above advice is intended only to reduce stress... I chased the "black list" side of site management for three years and ended up with an .htaccess near 30,000 lines. These days it is less than 200.

Pfui

12:20 am on Oct 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since August, the bot's asked for robots.txt, and respected it BUT for near-simultaneous hits to root:

nano2.dc.ukrtelecom.ua
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)

08/27 14:47:40 /robots.txt 200
08/27 14:47:41 / 403
08/27 14:48:02 /robots.txt 200
08/27 14:48:04 / 403

Of course, since / is Disallowed in robots.txt, the bot's kicked.

In the case of its Host, additional limits are extremely tight: ALL visitors from .ua require white-listing but for robots.txt hits, ditto ukrtelecom.ua's corporate compatriot ukrtel.net, for years a reliable source of spambots.

jmccormac

12:44 am on Oct 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What was interesting was the way it was mangling links to external sites. It was treating them as local pages and requesting the pages from the site. Same pattern with the nano2.dc.ukrtelecom.ua and ukrtel.net.ua showing up.

Regards...jmcc

Pfui

9:20 am on Oct 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In the last hour, hitting in pairs 15 minutes apart on two different sites:

nano2.dc.ukrtelecom.ua
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)
robot.txt? Yes (read & left)
More: [projecthoneypot.org...]

213.186.120.19*.utel.net.ua
Mozilla/5.0 (compatible; SiteBot/0.1; +http://www.sitebot.org/robot/)
robot.txt? Yes (read & left)
More: [projecthoneypot.org...]

For those keeping score, the bot-runner's using at least three Ukrainian ISPs within the Ukrtelecom [en.wikipedia.org...] monopoly constellation/coloc/server farm:

.ukrtelecom.ua
.ukrtel.net.ua (ditto: ukrtel.net)
.utel.net.ua (ditto: .utel.ua)

(sitebot.org is hosted by the (in)famous SoftLayer.)

FWIW, SiteBot's conduct is darn near perfection compared to MSN's. [webmasterworld.com...]