homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Now seeing Bingbot
Testing the waters a few days early
jdMorgan




msg:4208509
 3:12 am on Sep 29, 2010 (gmt 0)

Spotted in the logs today.

207.46.195.227 Tue Sep 28 20:24:15 2010 "GET /widgets.html HTTP/1.1" 200 4321 "-" "Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)" Connection="Keep-Alive" Accept="*/*" Accept-Encoding="gzip, deflate"

Jim

 

Pfui




msg:4208781
 3:39 pm on Sep 29, 2010 (gmt 0)

About 90 minutes ago:

msnbot-207-46-195-242.search.msn.com
Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)

robots.txt? NO

That was four minutes after:

msnbot-207-46-13-51.search.msn.com
msnbot/2.0b (+http://search.msn.com/msnbot.htm)

robots.txt? NO

Globetrotter




msg:4208929
 6:46 pm on Sep 29, 2010 (gmt 0)

How do you keep track of what and when the bots are visiting? Which software?

SEOPTI




msg:4208931
 6:56 pm on Sep 29, 2010 (gmt 0)

You don't need a software, linux grep command is your friend.

incrediBILL




msg:4208938
 7:00 pm on Sep 29, 2010 (gmt 0)

I'm seeing bingbot today as well but only 13 visits opposed to msnbot's 3757 visits by noon.

dstiles




msg:4208955
 7:11 pm on Sep 29, 2010 (gmt 0)

First hit around midnight GMT. Wasn't sure of the full UA so only trapped part of it. More difficult to spot in logs now: used to be easy when the bot name was at the start of the UA.

JCKline




msg:4208957
 7:12 pm on Sep 29, 2010 (gmt 0)

I'm seeing a few visits today from bingbot as well, yet MSNbot hit me over 2322 times. Still using both I assume...or maybe bingbot is still in beta?

Pfui




msg:4208975
 7:34 pm on Sep 29, 2010 (gmt 0)

Aside to Globetrotter: If you use Unix and can install/execute CGI scripts...

I tried paid-for, gazillion-stats scripts like Summary [summary.net...] and found they provided literally too much data. (Plus server-based updating was a pain.)

So instead I use little Perl scripts that incorporate the 'tail' command and quickly show the last 500 or so lines of my access, error, and mod_rewrite logfiles as web pages I then easily read in any browser. I also use another script that formats the same data by Host/IP, UA, files hit, referrer, etc.

The small scripts began with a nondescript error log 'tail' script which I then customized and modified to tail other logs and also match site layouts. The original log tail script I use is no longer available but here's an example of another free one: [perlscriptsjavascripts.com...]

The more complicated script is "Web Activity" by Matt Kruse. (I use an older version.) It's free and also customizable (by hand; do not mess around with scripts if you're new to Perl.) [mattkruse.com...] I depend on that script. I'd feel blind without it.

(Last but not least, I use Google Webmaster Tools, and Google Analytics.)

For additional ideas, check the other forums here. Lots of people will have lots of info about this, that and the other stats programs.

wilderness




msg:4208976
 7:39 pm on Sep 29, 2010 (gmt 0)

How do you keep track of what and when the bots are visiting?


Raw logs are the best, however many find them cumbersome.

jdMorgan




msg:4209098
 11:30 pm on Sep 29, 2010 (gmt 0)

Bingbot/2.0 seems to be following msnbot/2.0b around, just warming up to take over the job on or about Friday (as scheduled). I don't see any bingbot/2,0 requests for robots.txt either, so I assume that it's basing its crawling on msnbot/2.0b's robots.txt fetches.

That makes sense, since bingbot/2.0 is probably identical or at least very similar to msnbot/2.0b, but without the "b" for beta. If it was a whole new crawler, they'd more likely have named it bingbot/1.0 -- there'd be no sense in carrying the "2.0" forward if it was new.

Jim

lexipixel




msg:4209238
 8:30 am on Sep 30, 2010 (gmt 0)


Bingbot, the Sequel
[bing.com...]

Pfui




msg:4209309
 1:02 pm on Sep 30, 2010 (gmt 0)

Dangbot, the Sequel:

- Despite years-long site verification via meta tag on a "default webpage," Bing's Webmaster Tools now says, "Site ownership has not been verified."

- Despite still using the exact same verification code in the exact same tag (re)given by Bing Tools, .search.msn.com only requests "BingSiteAuth.xml", not the default page.

- Despite performing routine rDNS lookups and okaying bare IPs to confirm .search.msn.com and then limit same to certain MSN UAs (msnbot, now bingbot, etc.; none of their misc. junk), the Bingbot verification UA is -- wait for it --

msnbot-[yada-yada].search.msn.com
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)

- Despite uploading the danged .xml file even though I have the danged meta tag and re-re-reparsing .htaccess for over an hour punching holes, I'll be danged if I still can't (RE)verify.

Moral of the story: Make sure you're (still) verified. Good luck.

jdMorgan




msg:4209562
 8:38 pm on Sep 30, 2010 (gmt 0)

You'll need to get a new verification key I believe, regardless of whether you use the meta-tag or .xml file method.

In all fairness, this is not a bingbot thing, but rather a result of switching from MSN Webaster tools to Bing Webmaster tools.

However, someone should point out to them that if they're going to go through all of this hassle --and put us through it as well-- then they should have named all of this stuff "MSbot" or "MicrosoftBot" and dispensed with the "cute branded name" for Webmaster-facing resources in favor of one that probably won't ever have to change again...

As it is now, we've got msnbot, bingbot, and Yahoo! Slurp all crawling for essentially one index, and no information on how long we'll have to support all these user-agents.

Jim

incrediBILL




msg:4209599
 9:47 pm on Sep 30, 2010 (gmt 0)

They did a half-baked job on switching to Bingbot IMO because the full trip DNS verification still same MSNBOT!

Example:

207.46.13.42 - "Mozilla/5.0 (compatible; bingbot/2.0 +http://www.bing.com/bingbot.htm)"

Rev DNS of 207.46.195.106 is msnbot-207-46-195-106.search.msn.com.


How confusing is that?

Ill thought out, ill conceived, making changes for no other purpose than branding, messing with webmasters making them waste time on meaningless updates.

Then you still have them using MSNBOT as the name in reverse DNS so you're still checking for MSNBOT and BINGBOT together!

What a big fat hairy mess for no particular reason and the only thing I'm thankful for is we didn't go down this same path with a Livebot in the middle of it all!

The only problem with jdMorgan's suggest of MicrosoftBot, which I like, is that the entire internet search unit isn't poised to easily sell if the crawler identifies itself as either Microsoft or even MSN for that matter.

Just a thought ;)

Pfui




msg:4209632
 10:35 pm on Sep 30, 2010 (gmt 0)

Jim: Thus far, on at least two sites, the old msnbot and 'new' bingbot 32-alphanumeric keys/tags/codes are identical. And even though I repeatedly tell Tools I use the tag, bingbot (not msnbot) looks for "BingSiteAuth.xml".

wilderness




msg:4209678
 1:11 am on Oct 1, 2010 (gmt 0)

Ill thought out


Same as when they began in 2003.

You'd think (at least normal people would) that MS would have learned from those previous mistakes?
Shhesh. . . .

jdMorgan




msg:4213940
 7:26 pm on Oct 8, 2010 (gmt 0)

And now, to further illustrate their QA program, they've apparently deployed another bingbot user-agent:

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

Note the addition of a semicolon following "bingbot/2.0".

Jim

dstiles




msg:4214010
 10:41 pm on Oct 8, 2010 (gmt 0)

Which is more correct than the semi-less one but we'll have to see which wins. :)

encyclo




msg:4214042
 12:39 am on Oct 9, 2010 (gmt 0)

Has anyone seen bingbot fetch robots.txt yet? I'm still seeing bingbot simply following on from a hit by msnbot on the robots.txt.

jdMorgan




msg:4214080
 3:15 am on Oct 9, 2010 (gmt 0)

That's what I see so far as well. The robots.txt fetcher is probably a different program, and its user-agent string probably hasn't been updated yet.

Jim

dstiles




msg:4214841
 7:54 pm on Oct 10, 2010 (gmt 0)

Since the first appearance of the semi-colon there has only been a single instance without it here, and that was within the first few minutes. I'm switching to the semi UA.

Staffa




msg:4217146
 1:19 pm on Oct 15, 2010 (gmt 0)

Two sites, first time one visit each with

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

asked for robots.txt and home page

Pfui




msg:4218389
 6:34 pm on Oct 18, 2010 (gmt 0)

Dear MSN coders: Even if your bots may be 'sharing' robots.txt, it's long past time to get your UA act together, and stop cloaking, too. [webmasterworld.com...] E.g., a mere three minutes apart:

msnbot-207-46-204-170.search.msn.com
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; SLCC1; .NET CLR 1.1.4322; .NET CLR 2.0.40607; .NET CLR 3.0.04506.648)
10/18 01:49:19
(URI: root)

msnbot-207-46-12-238.search.msn.com
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
10/18 01:46:43
(URI: robots.txt)

incrediBILL




msg:4220078
 12:51 am on Oct 22, 2010 (gmt 0)

bingbot user agent crawling is more dominant now, I got about 5K hits from it so far today, only 108 from deprecated msnbot

jdMorgan




msg:4220110
 3:00 am on Oct 22, 2010 (gmt 0)

Yes, and a Staffa pointed out on the 15th, now seeing the bingbot UA fetching robots.txt. I'm looking forward to shortening/simplifying my robots.txt file (and the code that produces it), so I'm hoping they'll pull the plug on msnbot soon...

Jim

Pfui




msg:4224717
 2:23 am on Nov 1, 2010 (gmt 0)

Bingbot just hit from a bare (no rDNS) MSN IP...

MSN's many cloaked bots. Again. [webmasterworld.com...]
(link to #msg4224716 may be iffy)

keyplyr




msg:4224733
 3:00 am on Nov 1, 2010 (gmt 0)

Bingbot just hit from a bare (no rDNS) MSN IP...


Thanks for the heads-up Pfui. Guess I was asleep at the wheel and blocked this one.

157.55.16.231 - - [30/Oct/2010:01:50:32 -0700] "GET www.example.com HTTP/1.1" 403 479 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

dstiles




msg:4225169
 10:50 pm on Nov 1, 2010 (gmt 0)

Yep, got those too!

Hits in the general range 157.55.16.0 - 157.55.18.255 seem to be hitting with bingbot (total 13 IPs so far), all blocked.

Pfui




msg:4225211
 12:41 am on Nov 2, 2010 (gmt 0)

@keyplyr: Are you allowing Bing from bare MSN IPs, not just rDNS-confirmable .search.msn.com addresses?

keyplyr




msg:4225279
 6:06 am on Nov 2, 2010 (gmt 0)

@Pfui I wasn't, but am now. Who knew?

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved