Forum Moderators: mack

Message Too Old, No Replies

FAO Microsoft re: MSNbot

MSNbot Bandwidth

         

computerology

8:22 pm on Jul 4, 2004 (gmt 0)

10+ Year Member



Hey guys, I was trolling through my logs as I had noticed a little bit of abuse and noticed your new spider has been agressively trolling my pages.

Didnt know what MSNbot was until I looked it up, sounds like a great idea, but could you take one thing in mind?

I have several large UHA files on the server (Uharc archive) and I have noticed that these files are being spidered by the bot and downloaded. These are 20-40Mb files and with your bot crawling around it doesnt take long to hit a Gb of transfer.

Could you mod your bot to omit *.ZIP, *.UHA, *.SFT and other archive files- theres no need for your server to cache specific downloads from other websites; and on most sites you'd probably be entering copyright infringement territory by cross hosting the data anyway- even if you are helping operators to reduce their bandwidth.

Go ahead, rip the pages all you want, my dynamic pages are all timestamped, just leave alone any non htm, php, asp, txt, j2ee pages because your software likely cant index the contents of an archive file (like UHA ZIP or SFT) and with audio or video data, regardless of encoding, probably wont ever be able to categorize it by type (ie. if your engine could find hard breaks audio files by analyzing the sounds itslef, i'd probably have a cardiac arrest and die of shock)

Hope you read this, good luck with the spider.

rfgdxm1

8:43 pm on Jul 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Block those large files with robots.txt.