Welcome to WebmasterWorld Guest from 54.167.157.247

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

msnbot/2 snapshots?

   
10:16 am on Sep 19, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




verified rDNS, normal crawl range.

Hit each page on a 150 page site twice in succession, response code 200 for all. 1st time the size of file is normal. The second time it looks like the file size of the html plus the images. So I'm assuming msnbot/2 is taking snapshots. Anyone else verify this?

2:54 pm on Sep 19, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I see it crawling but didn't see any images being downloaded.

Probably because I block all images from being downloaded ;)

Actually, they could be combining the actions of the normal crawler and the image crawler but if you're seeing back-to-back download a page then download the page again with all the images it sounds like possible screen shots.

7:27 pm on Sep 19, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I see it crawling but didn't see any images being downloaded.

I didn't see images downloaded in this crawl. But the total size of the second request would equal page plus images, so I am assuming these are snapshots.

...I block all images from being downloaded

LOL, really? Then what's the point of having images?

I block image downloads from remote servers and other off-site referrers (w/ some exceptions)

7:34 pm on Sep 19, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Spot the difference :)

www.example.com 65.55.106.nnn - - [19/Sep/2009:14:14:37 -0500] "GET /widget HTTP/1.1" 200 16511 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
www.example.com 65.55.106.nnn - - [19/Sep/2009:14:14:46 -0500] "GET /widget HTTP/1.0" 200 85960 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"

HTTP/1.1 vs HTTP/1.0 the former uses .gz

11:24 pm on Sep 19, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



caribguy - that's it. I do gzip html files. Care to suggest a reason why msnbot would do this?
12:48 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



LOL, really? Then what's the point of having images?

I let users download them, but not the SEs.

The SE image index is one huge image theft ring that people use without asking.

Worse case, I found a bunch of unscrupulous sites using Google images to locate my thousands of images and hotlink them into their pages.

We had a massive assault on that nonsense, all hotlinks blocked, all images blocked from SEs downloading.

FYI, the images I'm talking about in this case was my library of 40K+ site screen shots so you could see why some wise guys thought I'd be a good source for a free ride.

HTTP/1.1 vs HTTP/1.0 the former uses .gz

Isn't that backwards? Shouldn't it be the HTTP/1.1 using .gz?

1:04 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Yep 1.1 uses .gz - former, as in mentioned first...

I wouldn't dare to even wager a guess on why M$ is doing this. To me it falls in the same category as the referrer spam discussed here before, or their attempts to grab images and truncated urls with the WinHTTP user agent...

Very tempting to add yet another directive to my rewrite rules...

3:28 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Made me look back through my database of this year's logs... The critter is there, but not in great numbers. But thanks for the heads up... I will watch this for a month or so and see if any changes need to be made in .htaccess
6:44 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I let users download them, but not the SEs.

I know what you meant Bill, I was only joking.

I take a slightly different tactic. I block all image requests from off-site origins, but I do allow the Big 3 (4?) SEs to put most image files in their image search libraries.

When the SE users click on these thumbnailed images, instead of the SE's page that hot-links to my image file, my scripting displays the page of origin (my site.) Thus my 10k images serve another function to increase traffic.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month