homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Now seeing Bingbot
Testing the waters a few days early

 3:12 am on Sep 29, 2010 (gmt 0)

Spotted in the logs today. Tue Sep 28 20:24:15 2010 "GET /widgets.html HTTP/1.1" 200 4321 "-" "Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)" Connection="Keep-Alive" Accept="*/*" Accept-Encoding="gzip, deflate"




 3:39 pm on Sep 29, 2010 (gmt 0)

About 90 minutes ago:

Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)

robots.txt? NO

That was four minutes after:

msnbot/2.0b (+http://search.msn.com/msnbot.htm)

robots.txt? NO


 6:46 pm on Sep 29, 2010 (gmt 0)

How do you keep track of what and when the bots are visiting? Which software?


 6:56 pm on Sep 29, 2010 (gmt 0)

You don't need a software, linux grep command is your friend.


 7:00 pm on Sep 29, 2010 (gmt 0)

I'm seeing bingbot today as well but only 13 visits opposed to msnbot's 3757 visits by noon.


 7:11 pm on Sep 29, 2010 (gmt 0)

First hit around midnight GMT. Wasn't sure of the full UA so only trapped part of it. More difficult to spot in logs now: used to be easy when the bot name was at the start of the UA.


 7:12 pm on Sep 29, 2010 (gmt 0)

I'm seeing a few visits today from bingbot as well, yet MSNbot hit me over 2322 times. Still using both I assume...or maybe bingbot is still in beta?


 7:34 pm on Sep 29, 2010 (gmt 0)

Aside to Globetrotter: If you use Unix and can install/execute CGI scripts...

I tried paid-for, gazillion-stats scripts like Summary [summary.net...] and found they provided literally too much data. (Plus server-based updating was a pain.)

So instead I use little Perl scripts that incorporate the 'tail' command and quickly show the last 500 or so lines of my access, error, and mod_rewrite logfiles as web pages I then easily read in any browser. I also use another script that formats the same data by Host/IP, UA, files hit, referrer, etc.

The small scripts began with a nondescript error log 'tail' script which I then customized and modified to tail other logs and also match site layouts. The original log tail script I use is no longer available but here's an example of another free one: [perlscriptsjavascripts.com...]

The more complicated script is "Web Activity" by Matt Kruse. (I use an older version.) It's free and also customizable (by hand; do not mess around with scripts if you're new to Perl.) [mattkruse.com...] I depend on that script. I'd feel blind without it.

(Last but not least, I use Google Webmaster Tools, and Google Analytics.)

For additional ideas, check the other forums here. Lots of people will have lots of info about this, that and the other stats programs.


 7:39 pm on Sep 29, 2010 (gmt 0)

How do you keep track of what and when the bots are visiting?

Raw logs are the best, however many find them cumbersome.


 11:30 pm on Sep 29, 2010 (gmt 0)

Bingbot/2.0 seems to be following msnbot/2.0b around, just warming up to take over the job on or about Friday (as scheduled). I don't see any bingbot/2,0 requests for robots.txt either, so I assume that it's basing its crawling on msnbot/2.0b's robots.txt fetches.

That makes sense, since bingbot/2.0 is probably identical or at least very similar to msnbot/2.0b, but without the "b" for beta. If it was a whole new crawler, they'd more likely have named it bingbot/1.0 -- there'd be no sense in carrying the "2.0" forward if it was new.



 8:30 am on Sep 30, 2010 (gmt 0)

Bingbot, the Sequel


 1:02 pm on Sep 30, 2010 (gmt 0)

Dangbot, the Sequel:

- Despite years-long site verification via meta tag on a "default webpage," Bing's Webmaster Tools now says, "Site ownership has not been verified."

- Despite still using the exact same verification code in the exact same tag (re)given by Bing Tools, .search.msn.com only requests "BingSiteAuth.xml", not the default page.

- Despite performing routine rDNS lookups and okaying bare IPs to confirm .search.msn.com and then limit same to certain MSN UAs (msnbot, now bingbot, etc.; none of their misc. junk), the Bingbot verification UA is -- wait for it --

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

- Despite uploading the danged .xml file even though I have the danged meta tag and re-re-reparsing .htaccess for over an hour punching holes, I'll be danged if I still can't (RE)verify.

Moral of the story: Make sure you're (still) verified. Good luck.


 8:38 pm on Sep 30, 2010 (gmt 0)

You'll need to get a new verification key I believe, regardless of whether you use the meta-tag or .xml file method.

In all fairness, this is not a bingbot thing, but rather a result of switching from MSN Webaster tools to Bing Webmaster tools.

However, someone should point out to them that if they're going to go through all of this hassle --and put us through it as well-- then they should have named all of this stuff "MSbot" or "MicrosoftBot" and dispensed with the "cute branded name" for Webmaster-facing resources in favor of one that probably won't ever have to change again...

As it is now, we've got msnbot, bingbot, and Yahoo! Slurp all crawling for essentially one index, and no information on how long we'll have to support all these user-agents.



 9:47 pm on Sep 30, 2010 (gmt 0)

They did a half-baked job on switching to Bingbot IMO because the full trip DNS verification still same MSNBOT!

Example: - "Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)"

Rev DNS of is msnbot-207-46-195-106.search.msn.com.

How confusing is that?

Ill thought out, ill conceived, making changes for no other purpose than branding, messing with webmasters making them waste time on meaningless updates.

Then you still have them using MSNBOT as the name in reverse DNS so you're still checking for MSNBOT and BINGBOT together!

What a big fat hairy mess for no particular reason and the only thing I'm thankful for is we didn't go down this same path with a Livebot in the middle of it all!

The only problem with jdMorgan's suggest of MicrosoftBot, which I like, is that the entire internet search unit isn't poised to easily sell if the crawler identifies itself as either Microsoft or even MSN for that matter.

Just a thought ;)


 10:35 pm on Sep 30, 2010 (gmt 0)

Jim: Thus far, on at least two sites, the old msnbot and 'new' bingbot 32-alphanumeric keys/tags/codes are identical. And even though I repeatedly tell Tools I use the tag, bingbot (not msnbot) looks for "BingSiteAuth.xml".


 1:11 am on Oct 1, 2010 (gmt 0)

Ill thought out

Same as when they began in 2003.

You'd think (at least normal people would) that MS would have learned from those previous mistakes?
Shhesh. . . .


 7:26 pm on Oct 8, 2010 (gmt 0)

And now, to further illustrate their QA program, they've apparently deployed another bingbot user-agent:

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Note the addition of a semicolon following "bingbot/2.0".



 10:41 pm on Oct 8, 2010 (gmt 0)

Which is more correct than the semi-less one but we'll have to see which wins. :)


 12:39 am on Oct 9, 2010 (gmt 0)

Has anyone seen bingbot fetch robots.txt yet? I'm still seeing bingbot simply following on from a hit by msnbot on the robots.txt.


 3:15 am on Oct 9, 2010 (gmt 0)

That's what I see so far as well. The robots.txt fetcher is probably a different program, and its user-agent string probably hasn't been updated yet.



 7:54 pm on Oct 10, 2010 (gmt 0)

Since the first appearance of the semi-colon there has only been a single instance without it here, and that was within the first few minutes. I'm switching to the semi UA.


 1:19 pm on Oct 15, 2010 (gmt 0)

Two sites, first time one visit each with

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

asked for robots.txt and home page


 6:34 pm on Oct 18, 2010 (gmt 0)

Dear MSN coders: Even if your bots may be 'sharing' robots.txt, it's long past time to get your UA act together, and stop cloaking, too. [webmasterworld.com...] E.g., a mere three minutes apart:

Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
10/18 01:49:19
(URI: root)

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
10/18 01:46:43
(URI: robots.txt)


 12:51 am on Oct 22, 2010 (gmt 0)

bingbot user agent crawling is more dominant now, I got about 5K hits from it so far today, only 108 from deprecated msnbot


 3:00 am on Oct 22, 2010 (gmt 0)

Yes, and a Staffa pointed out on the 15th, now seeing the bingbot UA fetching robots.txt. I'm looking forward to shortening/simplifying my robots.txt file (and the code that produces it), so I'm hoping they'll pull the plug on msnbot soon...



 2:23 am on Nov 1, 2010 (gmt 0)

Bingbot just hit from a bare (no rDNS) MSN IP...

MSN's many cloaked bots. Again. [webmasterworld.com...]
(link to #msg4224716 may be iffy)


 3:00 am on Nov 1, 2010 (gmt 0)

Bingbot just hit from a bare (no rDNS) MSN IP...

Thanks for the heads-up Pfui. Guess I was asleep at the wheel and blocked this one. - - [30/Oct/2010:01:50:32 -0700] "GET www.example.com HTTP/1.1" 403 479 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"


 10:50 pm on Nov 1, 2010 (gmt 0)

Yep, got those too!

Hits in the general range - seem to be hitting with bingbot (total 13 IPs so far), all blocked.


 12:41 am on Nov 2, 2010 (gmt 0)

@keyplyr: Are you allowing Bing from bare MSN IPs, not just rDNS-confirmable .search.msn.com addresses?


 6:06 am on Nov 2, 2010 (gmt 0)

@Pfui I wasn't, but am now. Who knew?

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved