homepage Welcome to WebmasterWorld Guest from 54.226.252.142
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Seoanalyzer Bingbot
bingbot user agent
grandma genie




msg:4509416
 2:51 pm on Oct 18, 2012 (gmt 0)

In a discussion on Bing's webmaster tool seoanaylzer, I mentioned that when I used it my site was showing a 403 forbidden error. Lucy suggested checking the logs to see why I was getting the error.

I discovered this log entry, showing the user agent string for that visit. Here it is:

131.253.38.nn - - [15/Oct/2012:14:10:33 -0400] "GET /example.html HTTP/1.1" 403 - "-" "Mozilla/5.0 (seoanalyzer; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

I was blocking seo as a user agent in htaccess.

--GG

 

wilderness




msg:4509464
 5:17 pm on Oct 18, 2012 (gmt 0)

gg,
I had a visit from this thing the other day, and I'm not a participant the of Bing analyzer program.

grandma genie




msg:4509644
 2:58 am on Oct 19, 2012 (gmt 0)

Well that's interesting. I've never seen the thing before I tried the tool in Bing Webmaster Tools.

Also had this in the logs; notice the user agents:

131.253.38.nn - - [15/Oct/2012:16:17:38 -0400] "GET / HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:39 -0400] "GET /example HTTP/1.1" 301 249 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:39 -0400] "GET /example/ HTTP/1.1" 200 24215 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:40 -0400] "GET /example HTTP/1.1" 200 85 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:19:46 -0400] "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (seoanalyzer; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

The IP does belong to msn.

Was it the same IP you saw or the seoanalyzer?

-- GG

keyplyr




msg:4509673
 4:27 am on Oct 19, 2012 (gmt 0)

I block "seo" because there are many data mining tools with "seo" in the UA. I also block "anaylzer" for the same reason.

I don't use Bing's webmaster tool seoanaylzer so the block doesn't interfere with my own work.

grandma genie




msg:4511806
 5:02 pm on Oct 24, 2012 (gmt 0)

Uhh, sorry, but I can't figure out what you are talking about. The samples I posted are copied and pasted. I didn't retype them. So, what you see is what you get. I had to use the "example" entries because that is what webmasterworld requires. Let's try it again:

131.253.38.67 - - [15/Oct/2012:16:17:38 -0400] "GET / HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:39 -0400] "GET /storefront HTTP/1.1" 301 249 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:39 -0400] "GET /storefront/ HTTP/1.1" 200 24215 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:17:40 -0400] "GET /BingSiteAuth.xml HTTP/1.1" 200 85 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn - - [15/Oct/2012:16:19:46 -0400] "GET / HTTP/1.1" 403 - "-" "Mozilla/5.0 (seoanalyzer; compatible; bingbot/2.0; +h**p://www.bing.com/bingbot.htm)"

This is what appeared in the logs, one hit after another. I'm assuming the GET / HTTP/1.1 is the index.php page. It was the first entry. The next two are the storefront page. Then you have the xml document and then the last entry got the 403 because of the seoanalyzer in the user agent.

I'm assuming you are looking at those 249, 24215 and 85 numbers. I don't know what those are or what they mean. I only check my logs for the IPs, referers, user agents, GET, POST, HEAD, etc. and what they are requesting.

So, you are talking over my head. What are you looking at and why is it such a big deal?

-- GG

lucy24




msg:4511966
 9:25 pm on Oct 24, 2012 (gmt 0)

What are you looking at and why is it such a big deal?


Not a big deal really, just interesting to a certain kind of warped mind ;)

The second number-- the one after the 300 or 403 or whatever-- is the total size of the file your server sent in response to the request. In fact you've got a beautifully illustrative snippet there.

The request for the index page comes through as no size at all; can I assume you've got the host redirecting requests that don't have "www" the way you like it?

The 403 also comes through as no size at all, although this one is your own 403 based on your own htaccess rules. Interesting. Guess it didn't even pretend to read your 403 page, assuming you've got one.

The request for /storefront is a directory-slash redirect. Bing seems to be fond of those too, as if they're actively seeking out Duplicate Content. (You may not know about this category of redirect. It happens automatically through the kind graces of mod_dir unless you have explicitly told it not to.) This one has to go all the way down to your site to verify that the directory exists, so somewhere along the line it picked up 249 bytes. Part of that may be the formal filename.

The third line /storefront/ is the actual size of your page, plus a bit for headers and so on. If the size had been, say, less than 1000, we'd assume the asker was getting rewritten to some custom error page. Rewrites come through as 200 (or 304); filesize is the only clue your logs give you.

And now you see where the 85 is coming from. I didn't notice before that it's xml rather than plain txt, so 85 may again be the actual filesize. You can easily check.

Lesson: no two servers count bytes the same way.

grandma genie




msg:4511987
 10:14 pm on Oct 24, 2012 (gmt 0)

Thank you, Lucy. That was very helpful.

I do not have a custom 403 or 404 page. The server is responding with its standard version.

My index.php file is tiny. All it does is redirect to the storefront.

This series of hits was due to my inputting my home page URL into the Bing Webmaster Tools Seo Analyzer area. When I did that, the tool was giving me a 403 error message, which I didn't understand because I was not blocking BING, but when I checked the logs (suggested by Lucy) I discovered the seoanalyzer user agent. I was blocking seo as a user agent in htaccess. Then I removed the seo block from my htaccess file and tried the tool again. This time it returned info about my site, so I knew it was the seo block that was not allowing the Bing tool to work.

I do not see seoanalyzer as a user agent in my logs unless I have used the tool in the Bing Webmaster Tools area. So, I suppose I could put the seo block back into my htaccess file. But I would have to remove it if I wanted to use the tool again.

Hope all this info is helping somebody.

-- GG

wilderness




msg:4512001
 11:04 pm on Oct 24, 2012 (gmt 0)

So, I suppose I could put the seo block back into my htaccess file. But I would have to remove it if I wanted to use the tool again.


gg,
a possible solution is a multi-conditional based upon both the UA-deny you were using previously and "except" the Bing IP's.
Ex (using seo as UA term)

#UA contains seo, except Bing IP's
RewriteCond %{HTTP_USER_AGENT} seo [NC]
RewriteCond %{REMOTE_ADDR} !^131\.253\.(2[1-9|3[0-9]|4[0-7])\.
RewriteRule .* - [F]

phranque




msg:4512030
 12:44 am on Oct 25, 2012 (gmt 0)

I had to use the "example" entries because that is what webmasterworld requires.


sorry - i misunderstood about the two "GET /example HTTP/1.1" requests actually being different "examples".

grandma genie




msg:4512059
 3:11 am on Oct 25, 2012 (gmt 0)

Thank you, Don. I will try that.

phranque, Maybe I'll try using example1 or example2 the next time to make things clearer for you detail oriented folks.

What I actually thought was interesting about this whole incident was the two different user agents Bing was using.

131.253.38.nn "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Win64; x64; Trident/4.0)"
131.253.38.nn "Mozilla/5.0 (seoanalyzer; compatible; bingbot/2.0; +h**p://www.bing.com/bingbot.htm)"

Is that a stealth user agent?

-- GG

phranque




msg:4512076
 4:35 am on Oct 25, 2012 (gmt 0)

did this "MSIE 8.0" user agent also request additional page resources such as scripts and images as a human browser would normally do?

it's rare to see a "real live human" IE8 user agent string without a .NET framework version specification.

grandma genie




msg:4512870
 12:20 am on Oct 27, 2012 (gmt 0)

No, just the five lines of code like you see them. That's why I thought it was odd. You'd think they would all show the bingbot ua.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved