homepage Welcome to WebmasterWorld Guest from 54.198.148.191
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
To Ban or Not to Ban MSNBOT?
MSN bot misbehaving on my site
Constantin




msg:1534262
 4:18 am on Oct 1, 2003 (gmt 0)

MSNBOT (0.11) decided to pay me a visit today and promptly went for a file in a directory that was specifically disallowed by my robots.txt file for the last 3 months. Subsequently, it got booted off the site.

Since writing to MSN seems to have no effect (they fail to respond, even if they claim they will in less than 24 hrs.) I wonder what other folks have done to get the Msnbot to behave better? Or is simply banning it the right response?

I suppose being served a bunch of 403's should make someone scratch their head out there. Cheers!

 

jeremy goodrich




msg:1534263
 5:05 am on Oct 1, 2003 (gmt 0)

I would ban it for the time being :) at least, until there is a bit more "meat" to the rumors of their building an engine...

Constantin




msg:1534264
 12:24 pm on Oct 1, 2003 (gmt 0)

Ironically, I got a reply to my first inquiry at MSN search later last night. Specifically, I was unhappy that sections of the web-site that had been disallowed by robots.txt were represented in their search engine (I could see the referrer strings in my logs).

Perhaps the folks at MSN will also refine their robot to behave itself better. In the meantime, I'm not going to ban it by UA, but will simply sit back and relax while my spider trap goes SNAAAP! every time a bad bot decides to spider every link on my site instead of the ones allowed by robots.txt.

It's pretty safe to assume that they're developing a bot. The investment in human capital is negligible, the potential returns great, and it's another way to cement their current hegemony. They have vast, fast farms of CPUs and good internet connections. Can you think of better starting conditions than these? Then, continue your thoughts along the lines of how MS will intergrate their version of Search into the OS.

History is on their side: Folks have had better alternatives but chose MS products because, though inferior, they were more convenient to use (see Netscape, etc.) MS doesn't have to be better than google... they simply have to be good enough so that users won't have the urge to open a web-browser with google rather than simply enter a search phrase into MS' version of Sherlock.

It pains me but lots of people don't know and don't care to know how to use their computers to their fullest capabilities. Hence the Windows hegemony and the huge IT staffs required to support a badly-written OS that is held together with chewing-gum. No one should accept an OS where enough critical flaws show up in a given year to rival the national debt meter in NYC. Yet people accept the MS-way for some reason.

Fizzy




msg:1534265
 9:18 pm on Oct 7, 2003 (gmt 0)

Hmmm, I noticed that msnbot paid a visit to me today and left its calling card.

Constantin seems to be on the ball here

[search.msn.com...]

Fizz

Constantin




msg:1534266
 3:35 pm on Oct 8, 2003 (gmt 0)

Our friend, the MSN bot was back.

Time: 2003-10-08 07:39:32 (EST)
IP: 131.107.137.166
UA: msnbot/0.11 (+http://search.msn.com/msnbot.htm)

Offense: Once again, it disregarded the robots.txt and was subsequently banned from the site. What a bunch of clowns. Read their info page (see link above) and you'd get the notion that it actually follows the robots.txt directives.

Considering that that my robots.txt file is 7 lines long and applies to all robots, I don't think that complexity is the issue. Somehow, Google has no problems, so why should MSN?

Conclusion: Anyone out there who has freely-available web-content that they feel strongly about should ban the MSNBot UA until the MSN Bot

1) Respects the robots.txt directives
2) Respects inline Meta tags for Robots.

Perhaps a big bunch of 403's will convince M$ to write a better bot.

mat_bastian




msg:1534267
 3:47 pm on Oct 8, 2003 (gmt 0)

"Before crawling a site/directory, MSNBOT will look at the first line of the robots.txt file. If it finds the User-Agent “MSNBOT”, MSNBOT will honor the request and will not crawl the site/directory. If it does not find the User-Agent "MSNBOT", then it will try to find the User-Agent "*" and follow its rules." - from the above link

onfire




msg:1534268
 4:06 pm on Oct 8, 2003 (gmt 0)

" Folks have had better alternatives but chose MS products because, though inferior, they were more convenient "

Constantin

I take you don't use any MS products at all then? even though they are so convenient to use?

MS may be played as the big bad monster, but its amazing how many people slag MS while sat at their own PCs running MS WINDOWS and other MS Products.......!

I do agree with you that their BOT should be obeying your robots.txt and i am sure it does or can, and i am sure your answer will be found here.

Question is how long before we will be begging (in some cases praying) for the MS Bot to come crawling our sites like we do now for the Google Bot?

Constantin




msg:1534269
 4:07 pm on Oct 8, 2003 (gmt 0)

Mat, I guess I'm now a bit confused. Could you be so kind and tell me what is there not to understand about the following robots.txt file?


User-agent: *
Disallow: /cgi-bin/
Disallow: /Family/
Disallow: /guardian/
Disallow: /INSEAD/
Disallow: /Pictures/
Disallow: /Wedding/Brunch/
Disallow: /Wedding/Dennis/
Disallow: /Wedding/FirstSet/
Disallow: /Wedding/Jeffs/
Disallow: /Wedding/Rahmens/
Disallow: /Wedding/Week08/
Disallow: /Wedding/Week09/
Disallow: /Wedding/Week10/
Disallow: /Wedding/Week11/
Disallow: /Wedding/Week12/
Disallow: /zzz/

As I mentioned before, the file validates, and the only robot being trapped by my honeypot is MSN. Considering that other spiders are all over my site every day, I cannot help but conclude that the folks at MSNBot are disregarding the robots.txt file. Awaiting your wisdom, Constantin

TomWaits




msg:1534270
 4:15 pm on Oct 8, 2003 (gmt 0)

Dude, I got news for you: other SEs have those pages in their SERPS too.

Constantin




msg:1534271
 4:21 pm on Oct 8, 2003 (gmt 0)

Onfire,

Yes, I use some MS products, but simply because I have no viable alternative. While I am a expert windows user due to work requirements (where MS rules), my home network is exclusively Mac. OSX is fantastic, Camino/Safari are better browsers, etc.

I only use MS where I have to (for example, coding 8.5MB Excel spreadsheets that use Crystal Ball for multiple scenario analysis)... and that's what I have Virtual PC for.

Answer me this though: How much has MS improved the Office line ever since they became the defacto standard. I'm not talking about crappy assistants and other fluff, but real features.

  • Has Word become easier to use, has the file format improved so that it's not the biggest crap of bloat ever conceived? No. In fact, it's only gotten worse. At least the Apple unit at MS was allowed to have the same file format instead of suffering from the intentional differences hoisted upon them in the past.

  • On the Excel side, has MS ever gotten around to offering more graphing solutions, improved performance, added more functions, etc? No. Try running thousands of iterations between linked spreadsheets. Every value passed between them is written to disk disk first. Brilliant performance improvement!

  • Don't even get me started on Outlook, considering that it's likely the biggest security hole ever conceived. Yes, MS Windows and its products are the most likely targets because they have the widest distribution. However, the inherent insecurity of the MS operating system CANNOT be overlooked. That is why I have a Linux server for my siste.

    As I see it, MS is resting on its laurels, but the market is too afraid to re-enter the fray because the VC's know that the minute MS trains its guns back on them that they'll be frog-meat in the next X-Box shoot-em-up. And here Ladies and Gentlemen is the biggest problem of them all: Stagnation.

    It's not just that the OS and its applications are being dominated by one company with one vision (however fragmented, hard to use, etc.) but that the entire industry is being held back and hostage. That'll cost us far more than most people can imagine.

    Last of all, consiering how much money Google is making these days with its unobtrusive ads, I think folks will be doing more than begging and praying for the MSN Bot to show up. There is no other reason for MS to invest the cash now and replicate the Google model. They intend to make money and Google has shown them the way.

    [edited by: Constantin at 4:36 pm (utc) on Oct. 8, 2003]

  • creative craig




    msg:1534272
     4:31 pm on Oct 8, 2003 (gmt 0)

    Question is how long before we will be begging (in some cases praying) for the MS Bot to come crawling our sites like we do now for the Google Bot?

    When we can confirm what they are intending to do with the data that they collect. They have not said for sure that they will be adding the content to any search engine let alone MSN.

    Craig

    onfire




    msg:1534273
     5:00 pm on Oct 8, 2003 (gmt 0)

    Constantin

    We all know MS are Far from perfect but with Money to burn on marketing and by doing their product is in nearly every home, they ain't done so bad for a "bunch of clowns" and whatever is thrown at them its just shrugged off where most have or would have folded.

    But the point here is the MSN Bot, and i stand by what i said, in 12 max 24 months time we will all be wanting the MSN Bot to come crawling our sites (not asking how to stop it) especially if they are gonna go for it big time on their own SE, and give Google a run for its money, & money is what MS has Lots of!

    Josk




    msg:1534274
     5:00 pm on Oct 8, 2003 (gmt 0)

    One thing to remember is that Microsoft, however much they say to the different, have *NEVER* provided any innovation... Virtually every feature they have added to Windows or to any of their products has been either copied or bought off others.

    And now they are going to do the same to search...

    deft_spyder




    msg:1534275
     5:24 pm on Oct 8, 2003 (gmt 0)

    For more information on the digression of this topic, go to slashdot, where you can hear about hate for microsoft on a massive scale that will dwarf this little jaunt and make all the haters happy.

    And now (hopefully) back to the topic.... dissallowing bots.

    Constantin




    msg:1534276
     5:28 pm on Oct 8, 2003 (gmt 0)

    Onfire,

    The need for the MSNBot or lack thereof is probably dependant on how much money you're trying to make via your site. Seeing that my site never has nor ever intends to become a profit center, I can do without MSN if need be. Others might not be in the same fortunate position.

    I also have to post a correction: Using the local robots.txt checker, my robots.txt file did not validate (it had validated in the past). My guess is that it has to do with subtle differences regarding line breaks.

    Pasting the exact same syntax straight into a robots.txt file via emacs did the trick. I guess I'll now have to dig out how to make BBEdit UNIX line-break compliant.

    OK, who want's to throw the first egg? ;)

    mvl22




    msg:1534277
     5:37 pm on Oct 8, 2003 (gmt 0)

    Ban.

    MSNbot was misbehaving on my sites by requesting junk URLs that I do not believe exist as links elsewhere. Given that it's not generating any useful traffic, I see no reason to be having to filter out the 404 errors it generates.

    DrDoc




    msg:1534278
     6:15 pm on Oct 8, 2003 (gmt 0)

    Then, continue your thoughts along the lines of how MS will intergrate their version of Search into the OS.

    ...which is already a reality in Longhorn

    World Wide Wibble




    msg:1534279
     6:26 pm on Oct 8, 2003 (gmt 0)

    I am quite amused that anyone expects the MSNBOT to work properly. Did you not notice the first two letters of its name?

    Constantin




    msg:1534280
     6:37 pm on Oct 8, 2003 (gmt 0)

    Then, continue your thoughts along the lines of how MS will intergrate their version of Search into the OS.

    ...which is already a reality in Longhorn

    And there you have it. Two years should be enough time for the MSNBot team to come up with a reasonably good solution. And, unlike Google vs. Altavista (remember them?), MS has much better control over the desktop AND browser than a non-spyware-enhanced site could ever have over your browser.

    Yeah, search bars and all that increase switching costs somwehat. However, there is nothing like being able to code the browser to point where you want it to. The masses will, most likely, never know the difference. That is, unless the DoJ is reinvigorated regarding anti-trust and MS is finally split into three groups as it should be.

    mcavic




    msg:1534281
     5:37 am on Oct 9, 2003 (gmt 0)

    I am quite amused that anyone expects the MSNBOT to work properly. Did you not notice the first two letters of its name?

    Robots.txt compliance will be included with MSNBOT Service Pack 3.

    Kerrin




    msg:1534282
     11:30 am on Oct 9, 2003 (gmt 0)

    Perhaps Microsoft have developed their "own version" of the robots exclusion standard. That way we'll have to write two different robots.txt files: one for Microsoft and one for Netscape... urm I mean other crawlers ;)

    World Wide Wibble




    msg:1534283
     5:11 pm on Oct 9, 2003 (gmt 0)

    Robots.txt compliance will be included with MSNBOT Service Pack 3.

    :) No, the MSNBOT is programmed to read the robots.txt file... it just usually crashes before it gets that far.

    Sorry about the multiple boxes, I just thought it looked kinda pretty :) </silliness>

    mcavic




    msg:1534284
     11:44 pm on Oct 9, 2003 (gmt 0)

    It's not a bug, it's a feature that just doesn't work very well.

    redzone




    msg:1534285
     2:43 pm on Oct 10, 2003 (gmt 0)

    I always get a laugh, when I read the posts from those that like to give MS the swift boot in the behind.. It's as if MS was always the king of the hill..

    For those of you that have been using PC's for the past 21 years, you'd remember that MS did "nothing" but supply the operating system for the early IBM PC's and clones.

    They did not have the market cornered for word processing and spreadsheet applications, in fact some say it's a miracle that they knocked Word Perfect and Wordstar off the heap in the WP vertical.

    And no they didn't gain market share by giving away the product... But I get tired of hearing about how big, bad, and ugly MS is.. They got to the top, and no one has been able to knock them off. Call it monopoly, call it what you want.. This is America, and Free Enterprise rules!

    As for MSNbot, I think MS is 18 months minimum, from getting any type of search index built that is usable. Their corporate bloat mentality just won't allow them to get any product to audience in a short time frame..

    As for MS products, I too have problems with MS, that they don't get new features into Excel. Things like wildcard string functions, so I don't have to build macros for everything...

    twebdonny




    msg:1534286
     6:44 pm on Oct 10, 2003 (gmt 0)

    I would be happy to get MSNBot to come a looking at our site. Any ideas on how to put out the bait?

    Constantin




    msg:1534287
     7:17 pm on Oct 10, 2003 (gmt 0)

    MSN currently os not taking submissions. For some reason they've been here three times, only to get their fingers slammed in the door. Persistent they are... Now that my robots.txt file has been validated, perhaps my site and the MSNBot can be friends after all.

    I was finally able to incorporate multiple <Files> statements in the .htaccess file, so now the site has the protection I want for certain files (like the .htaccess file itself) while allowing unfettered access to the robots.txt and banned.html files.

    It's nice to have a fully functional bot trap. The formmail attacks have dropped appreciably in the last three weeks. Maybe it's related, maybe not. But it sure beats looking up culprits and complaining to their ISPs all the time.

    volatilegx




    msg:1534288
     2:16 pm on Oct 16, 2003 (gmt 0)

    I'd love to see a rep from the MSNBOT team posting here to let us know when the "prototype" period is over and the bot/index is ready to go live. I don't suppose we have any MS programmers here?

    jeremy goodrich




    msg:1534289
     9:14 pm on Oct 16, 2003 (gmt 0)

    volatilegx, I'd bet lunch that they've read this thread ;)

    However, few companies are willing to jump in a public forum they don't control making off the cuff comments that are unapproved by their legal team...

    I'm willing to bet if you contacted them, you could get some more details & then share with the rest of us / as I am sure that MSN wants to encourage user (and webmaster) interest in their new product for when it launches.

    Constantin




    msg:1534290
     1:54 am on Oct 17, 2003 (gmt 0)

    I would bet lunch too that MSN is lurking around here. After all, what better place can you think of to see if folks are noticing your bot probing their sites?

    So far, MSNBot hasn't tripped the tripwires again. It hasn't shown up in the AWstats summary either.

    (twiddling thumbs)

    Maybe my triple salvo of 403's put it off? Oh well.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Microsoft / Bing Search Engine News
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
    © Webmaster World 1996-2014 all rights reserved