Forum Moderators: open
The visits sometimes start with a grab of robots.txt by Slurp, followed by a request for a page by the Firefox UA. Stylesheets are also requested (although, occasionally, the @imported stylesheet isn't, and at other times there's no stylesheet requests), but no other external files. Here is an Apache log sample of a visit:
74.6.8.105 [19/Nov/2007:03:28:26 +0000] "GET /robots.txt HTTP/1.0" 200 3988 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
74.6.8.105 [19/Nov/2007:03:28:27 +0000] "GET /page.html HTTP/1.0" 200 13795 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071102 BonEcho/2.0.0.4"
74.6.8.105 [19/Nov/2007:03:28:44 +0000] "GET /style.css HTTP/1.0" 200 1570 "http://www.example.com/page.html" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071102 BonEcho/2.0.0.4"
74.6.8.105 [19/Nov/2007:03:28:57 +0000] "GET /imported.css HTTP/1.0" 200 14088 "http://www.example.com/style.css" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071102 BonEcho/2.0.0.4"
Over the first 24 hours since first sighting, there have been only about 35 pages grabbed. Yahoo also seems to be working from old data; a couple of the requests have been for pages that had their URL changed six months ago. (Yahoo quickly saw the redirects and updated accordingly, but this Firefox visitor of their's doesn't know about the new URLs, and needs to be redirected.)
Visits have also been coming from 74.6.8.75 (although, in the early UTC morning of the 20th, this IP switched to a "normal" Slurp UA), and from 74.6.8.102.
Verisign skulking about, Microsoft [webmasterworld.com] too, and now Yahoo. Guesses on who's next?
I've had Linux denied for some time.
Fortunately, it hasn't effected the other SE bots or their listing of my pages.
There's an old thread (somewhere)in which I explained my reasons for denying Linux.
I've grown terribly tired of the many variations of Yahoo and MSN bots and and their numerous IP ranges.
Many crawls are taking place simultaneously and I've reven had instances of different bots from the same SE requesting the same pages simultaneously, which is absurd.
Don
Then from the same Ip range comes the following:
74.6.22.161 - - [20/Dec/2007:07:22:47 -0600] "GET /MyFolder/MyPage.html HTTP/1.0" 403 - "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071214 BonEcho/2.0.0.4"
just a heads up.
74.6.8.126 - - [23/Dec/2007:15:51:09 -0500] "GET /example1.html HTTP/1.0" 404 1545 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20071214 BonEcho/2.0.0.4"
Also seeing odd requests for 2 files at once:
74.6.8.126 - - [23/Dec/2007:01:14:22 -0500] "GET /example1.html/example2.html HTTP/1.0" 404 1545 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
Also attempted to access a denied directory:
74.6.8.126 - - [23/Dec/2007:14:43:19 -0500] "GET /php/ HTTP/1.0" 403 366 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"