Forum Moderators: open

Message Too Old, No Replies

covert Microsoft crawl

         

keyplyr

7:52 pm on Jun 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




(Don't know if this exact occurrence has been reported before.)

Full Crawl, no robots.txt

UA: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1)

Microsoft
207.46.0.0 - 207.46.255.255
207.46.0.0/16


************ also **********

Full Crawl, no robots.txt

UA: Mozilla/5.0 (compatible; Yahoo! Nano; [help.yahoo.com...]

Microsoft
94.245.64.0 - 94.245.127.255
94.245.64.0/18

dstiles

9:10 pm on Jun 1, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wouldn't expect either to look at robots.txt - they are not (officially) bots.

You do not give details of the 207.46 IPs involved - some within that range have bot rDNS and others do not. I do see hits from MS on bot IPs that are not bots (ie the UAs are wrong) - I just 403 them. The UA looks very bare - I would expect lots of .net stuff and so forth if it were a real person - but then you would not expect a real person to crawl all pages.

None of the 94.245 IPs is designated as bots as far as I know. Were UAs on that range really yahoo? Seems odd. The range itself is listed in DNS as UK rather than US. I don't have anything bad listed for the IP range.

A full crawl with either UA does sound odd. Have you annoyed anyone at MS recently? :)

keyplyr

12:50 am on Jun 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Both on a site with about 200 HTML pages. While they both received 403 due to comparative filters I have in place, they both continued asking for several dozen HTML (only) files. I see that as a full crawl as opposed to just the index page.

207.46.92.19 - - [31/May/2012:00:49:18 -0700] "GET www.example.com/page.html HTTP/1.1" 403 17449 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1)"

94.245.127.65 - - [31/May/2012:03:17:29 -0700] "GET www.example.com/page.html HTTP/1.1" 403 1061 "-" "Mozilla/5.0 (compatible; Yahoo! Nano; http://help.yahoo.com/help/us/ysearch/slurp)"

dstiles

6:31 pm on Jun 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



According to DNS, neither of those IPs is a bot so either someone in MS or someone using MS as a proxy to run a pseudo-bot? Sounds odd.

Haven't specifically come across nano before. A quick search suggests either a financial connection with yahoo or a war-game (same thing?).

keyplyr

9:07 pm on Jun 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




The ends-in "Windows NT 6.1)" generic UA block catches many covert hits and M$ has always used questionable UAs to hit and run, so nothing new here.

But what drew my attention was the Nano UA with the Slurp info page. If I see it again, I'll do some more digging.

wilderness

12:46 am on Jun 3, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've seen a reference to "Nano" and MS, thought it was here, however unable to locate anything.

Perhaps it was in my logs. Not sure.