Forum Moderators: open

Message Too Old, No Replies

SputnikBot

could be worse...

         

trintragula

4:06 pm on May 17, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I had a visit recently from:
Mozilla/5.0 (compatible; SputnikBot/2.3; +http://corp.sputnik.ru/webmaster)

Following the link leads to a Russian language robots page.
Translating it into English, it says that they obey robots.txt.
They show how to block them.
They promise to obey your crawl delay, and by default they have a minimum of 2 seconds.
They list their User agent string.
They list the IP ranges they currently use:

109.207.13.0/24
5.143.224.0/21
95.167.189.0/25
And sure enough my visit was from one of those ranges.

Their search engine apparently has filter settings for family use, and does appear to be a public search engine.

So far they have not abused my site.

lucy24

5:37 pm on May 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's also a
Mozilla/5.0 (compatible; SputnikImageBot/2.2)
but these do not appear to ask for robots.txt. Oddly, it's been asking for a specific one of my images (from one of the IPs you listed) every few months for what looks like several years.

They must have done something in the past to offend me, because my UA list includes the line
BrowserMatch \bsputnik keep_out

(Note that the word "sputnik" occurs twice in the UA string, once lower-case and once Title Case.)

:: detour to raw logs to see if they shed any light on \b anchor ::

Oh, I see. There's some Russian browser that contains the element "MRSPUTNIK". But in recent years this UA appears to be used almost entirely by referer-spam robots-- and of course it's ALL CAPS-- so I don't know what I was getting at.

Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.28) Gecko/20120306 Firefox/3.6.28 sputnik 2.5.2.8
Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 sputnik 2.1.0.18 YB/4.3.0

et cetera, each with minor variations.

Corollary discovery: The rule with word anchor doesn't seem to have been recognized. I'll have to delete the \b (retaining case matching) and check back in a few months.

trintragula

7:09 pm on May 17, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I've not seen the SputnikImageBot yet, though there are plenty of images on my site.
Presumably someone attempted to hotlink one of your images, which would draw the image bot's attention.

I have seen those other variations on sputnik, but not from the given ranges, so I think those are probably unrelated.

I suppose with robots.txt, what matters is not so much whether they obviously ask for it, but whether they obey it. It may be possible for any given visitor to have obtained it by some indirect means... though not asking for it not a good sign. Greenhorn robot builders may get better at complying with robots.txt over time (e.g. after they get their first negative feedback).