Forum Moderators: open

Message Too Old, No Replies

SurdotlyBot

         

keyplyr

8:49 am on Jun 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Info page talks about robots.txt even though the bot did not request robots.txt, at least not with this UA or from this IP range, not in the last 30 days.

Host:: AWS
52.0.0.0/11
52.0.0.0 - 52.31.255.255

52.4.188.30 - - [20/Jun/2015:17:40:59 -0700] "GET / HTTP/1.1" 403 968 "-" "Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html)"
52.4.188.30 - - [20/Jun/2015:17:40:59 -0700] "GET /favicon.ico HTTP/1.1" 403 968 "-" "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/532.9 (KHTML, like Gecko)

Odd that a bot would request favicon. From their info page I assume they build security report pages about web sites where the favicon may add to their validity.

aristotle

9:04 pm on Jun 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So was that supposed to be a test of the security of your website? If so, it was a rather feeble effort

keyplyr

11:36 pm on Jun 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So was that supposed to be a test of the security of your website?

Dunno, but I get a little tired of all these parasites using my properties as their product. In addition, some of these "website security" services get a little sanctimonious about it, implying that if I don't adhere to their standards it will hurt the user's trust of my site. I block them all, and with the amount of traffic I continue to get, it doesn't seem they matter much.

lucy24

12:35 am on Jun 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Odd that a bot would request favicon.

Oh, gosh, I've met them. I notice because they come from a blocked range-- somewhere in 52-- but I've got a favicon exemption for other reasons. They generally come in pairs for some reason.

:: detour to raw logs because I can't remember which site was involved ::

52.6.54.54 - - [23/Apr/2015:03:45:16 -0700] "GET /favicon.ico HTTP/1.1" 301 599 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36" 
52.5.80.170 - - [23/Apr/2015:03:45:17 -0700] "GET /robots.txt HTTP/1.1" 301 597 "-" "<same>"
52.5.80.170 - - [23/Apr/2015:03:45:17 -0700] "GET / HTTP/1.1" 403 2915 "-" "<same>"
<snip>
52.6.119.245 - - [23/Apr/2015:03:58:26 -0700] "GET /robots.txt HTTP/1.1" 301 597 "-" "<same>"
52.6.116.132 - - [23/Apr/2015:03:58:27 -0700] "GET / HTTP/1.1" 403 2915 "-" "<same>"
52.5.110.94 - - [23/Apr/2015:03:58:28 -0700] "GET /favicon.ico HTTP/1.1" 301 599 "-" "<same>"
That's just the first one I landed on. (The <snip> is unrelated stuff between the two pieces.) Various parts of 52.0.0.0/13, various Linux UAs. Wonder what they'd do if they got a non-403 on the front page? I think the 301 is because they use the wrong form of the domain name-- and hm, come to think of it, they never actually see robots.txt (or for that matter the favicon) do they, because they don't follow the redirect.

keyplyr

1:15 am on Jun 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think the 301 is because they use the wrong form of the domain name-- and hm, come to think of it, they never actually see robots.txt (or for that matter the favicon) do they, because they don't follow the redirect.

That's my assumption as well.

keyplyr

2:35 am on Jun 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For a long time I freely allowed favicon to everyone (along with several other files) but no longer. A couple years back, all these cookie cutter blogs and WP installs started using favicons as bullets when they mentioned other sites. This results in hundreds of daily hits to my server that were not actual visits, just remote favicon requests from every Tom, Dick & Harry (LOL) that loaded those blog/WP pages.

I now require several conditions on favicon requests.

lucy24

3:58 am on Jun 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The funny thing from my side is that there's one favicon request I'd be perfectly willing to grant: the DuckDuckGo favicon fetcher. (I assume it's part of the SERP, and hey, anything to make your site look good.*) But they happen to crawl from a blocked range-- can't remember if they're distributed or just very small-- and once they meet the 403 they never even try asking for the favicon. Unlike, say, Google's faviconbot which until pretty recently crawled with no UA at all,** meaning that its pickings were pretty well restricted to the favicon and robots.txt.


* For a while I saw quite a few favicon requests from Firefox thanks to the Favicon Reloader-- or Preloader, or whatever it's called-- extension that works with bookmarks. Look at your own bookmarks menu and you see how much more appealing things are when they have their own favicon. Don't know if it didn't get updated or if current Firefox users just don't like me as much ;)
** It now calls itself Firefox 6. I suppose there's some arcane reasoning behind this.

keyplyr

5:48 am on Jun 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Odd that Duck SE has a registered crawl range, but the favicon bot comes from AWS. That's probably why I don't have a favicon next to my Duck listing. Thanks for the heads-up.

I just poked a hole for it in 107.20/14. Hopefully that's the only block it's coming from. These AWS rewrites could go on & on. AWS is a real PITA.

I'm not blocking all favicon requests, just from blogs, forums, WP, and a dozen specific favicon scrapers that (at some point) I decided weren't useful to me. Admittedly I am behind with this.

keyplyr

8:36 am on Jun 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



• Got the heads-up from lucy24
• Poked the hole in AWS range to allow Duck favicon bot
• They came next day and got the favicon
• Today it's displayed in Duck SERP for my site's listings.

I love it when things work they way I want 'em to :)

keyplyr

5:42 am on Sep 20, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Surdotlybot now coming from several Webzilla blocks (guess they were too slimy even for AWS) still ignoring robots.txt. This is a very stupid bot, eating endless 403s, bloating my logs.

These guys are a real piece of work. They boast themselves as an internet security company that will benefit webmasters, but then tort the rights of webmasters!

tangor

7:04 am on Sep 21, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sad thing is that for some of these irritations the nuclear option of 403 is insufficient. Any step taken after that might lead to a criminal conviction as likely real violence and bodily harm might be involved. (winkers)

Way back when, when favicon first came out, it is a nice feature. Today it has become a PITA.

keyplyr

10:59 am on Oct 11, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not only is SurdotlyBot now coming from:
webazilla.com
208.88.224.0 - 208.88.227.255
208.88.224.0/22

208.88.224.217 - - [11/Oct/2015:01:25:39 -0700] "GET / HTTP/1.1" 403 984 "-" "Mozilla/5.0 (Linux; Android 4.3; Galaxy Nexus Build/JWR67B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.117 Mobile Safari/537.36"
208.88.224.197 - - [11/Oct/2015:01:26:09 -0700] "GET / HTTP/1.1" 403 984 "-" "Mozilla/5.0 (compatible; SurdotlyBot/1.0; +http://sur.ly/bot.html)"

...but they pose as an Android phone too.