Welcome to WebmasterWorld Guest from 54.166.54.215

Forum Moderators: Ocean10000 & incrediBILL

bebopbot

   
8:52 pm on Jun 10, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Anyone met this guy?

Mozilla/5.0 (compatible; Linux x86_64; BebopBot/2.5.1; +http://www.apassion4jazz.net/bebopbot.html)


According to their www page, possible IPs are
50.63.211.1, 70.179.4.113, 97.74.140.17, 97.74.144.120, 173.636.184.241 [sic]
Currently it's the 70.179 one.

Why they are now blocked:
70.179.4.113 - - [07/Jun/2014:19:09:45 -0700] "GET /dirname/pagename.html HTTP/1.1" 200 8695 "http://www.webmasterworld.com/profilev4.cgi?action=view&member=lucy24" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0" 
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /sharedstyles.css HTTP/1.1" 200 6346 "http://example.com/dirname/pagename.html" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/miststyles.css HTTP/1.1" 200 2785 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/fun-icon.png HTTP/1.1" 200 859 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/penguin.png HTTP/1.1" 200 1512 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/headers/header_beenthere.png HTTP/1.1" 200 1444 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/robot.png HTTP/1.1" 200 4506 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/panda.png HTTP/1.1" 200 2256 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/hummingbird.png HTTP/1.1" 200 4570 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:46 -0700] "GET /fun/images/collage_robot.png HTTP/1.1" 200 149084 "{ same }" "{ same }"
70.179.4.113 - - [07/Jun/2014:19:09:48 -0700] "GET /favicon.ico HTTP/1.1" 200 661 "-" "{ same }"

Quoted in full to illustrate perfect humanoid behavior, with all supporting files except js (on this page used only by piwik). Next comes:

70.179.4.113 - - [07/Jun/2014:19:10:24 -0700] "GET /dirname/pagename.html HTTP/1.1" 200 8695 "-" "Mozilla/5.0 (compatible; Linux x86_64; BebopBot/2.5.1; +http://www.example.net/bebopbot.html)" 
70.179.4.113 - - [07/Jun/2014:19:10:25 -0700] "GET /sharedstyles.css HTTP/1.1" 304 237 "-" "{ bebopbot }"
70.179.4.113 - - [07/Jun/2014:19:10:25 -0700] "GET /fun/miststyles.css HTTP/1.1" 304 237 "-" "{ bebopbot }"
70.179.4.113 - - [07/Jun/2014:19:10:25 -0700] "GET /fun/images/fun-icon.png HTTP/1.1" 304 237 "-" "{ bebopbot }"

(et cetera, as above, each with 304 and no referer) followed shortly afterward by
70.179.4.113 - - [07/Jun/2014:19:10:47 -0700] "GET /robots.txt HTTP/1.1" 200 635 "-" "{ bebopbot }"

There were a few subsequent page requests, each with accompanying css and images.

Now, my host is occasionally a bit hiccupy in logs-- but twenty-three seconds (time from first page request to first robots.txt request with this UA)? Nuh-uh.

Notice all those 304s? The robot's first page request-- the page that had previously been seen by the human(oid) UA-- came with
Cache-Control: max-age=0

The later page requests left it out. (Weird choice, btw. Search engines usually say no-cache; in fact the most common "max-age=0" is from Camino when I've explicitly refreshed a page.) I don't log headers for non-page requests, but apparently the robot wasn't as concerned with verisimilitude for those.

This annoys me.
4:53 pm on Jun 12, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





Did it crawl files disallowed by robots.txt? Did it request files too rapidly? Cause any server issues? Sorry, I don't see the problem here.

Bots will often cache robots.txt for various lengths of time, even on a different day and many times from a different IP or even using a different UA or GET tool. I see it all the time.
6:43 pm on Jun 12, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



Sorry, I don't see the problem here.

This does not surprise me. But if we're going to play games, here are the numbers.

Requests for robots.txt from all sources in the week preceding the robot's first visit: 149, including 6 redirects
From googlebot: 12
From bingbot: 12
From msnbot-media: 50 (this explains the unnaturally low bing number, heh heh)
From Mail.RU_bot: 38
From Yandexbot: 11
From MJ12bot: 10 (poor thing! If only it would stop crawling from blocked IP ranges, it would see a lot more pages)
From Seznambot: 5
From Exabot: 3
From assorted other named robots (one or two requests each): 8
 

Featured Threads

Hot Threads This Week

Hot Threads This Month