Forum Moderators: open

Message Too Old, No Replies

wonderbot/JS

         

keyplyr

9:02 am on Dec 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



UA: wonderbot/JS 1.0
Protocol: HTTP/1.1
Robots.txt: Yes
Host: upc.ro ISP

Robots.txt was not the first file requested. 2 hits per second. HTML only.

Similar named bot mentioned but appears to be a different bot from different range: [webmasterworld.com...]

lucy24

9:35 pm on Dec 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Robots.txt was not the first file requested.

Did they start by requesting one or both forms of your front page? "Let's find out whether the site exists at all before embarking on the arduous venture of reading robots.txt".

:: detour to raw logs* to confirm that this is a very common behavior ::

Oh, right: Among other things, it's part of the hilarious wp-login robot pattern. (Are they looking for the names of specific roboted-out directories like /admin/ ?) It must be a standard robot script. Not to be confused with legitimate robots which often have the pattern: robots.txt, front page, robots.txt again, further requests.


* In case I forget:
^([\d.]+) - - \S+ -0[78]00\] "GET / .+\n\1 - - \S+ -0[78]00\] "GET /robots
Server using Pacific time.

keyplyr

10:12 pm on Dec 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



wonderbot took a few files linked in the page's head section, then robots.txt, then a couple more light-weight files, then HTML, then just HTML for a dozen pages. Almost like we see often with browsers who run concurrent threads and the lighter files get served first.

This wasn't a WP vulnerability check. It looked more like a linear crawler, might even be the authentic crawler of upc.ro ISP but since they don't include an info page I automatically block them until I know otherwise.