Welcome to WebmasterWorld Guest from 54.167.58.159

Forum Moderators: mack

Message Too Old, No Replies

Bing adds directory to file path?

     

keyplyr

9:27 pm on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This has been going on for almost a year. Bing, and only Bing, intermittently adds a (non-existent) directory when requesting files. Example:


"GET www.example.com/YXNlT/page.html HTTP/1.1" 404 4005 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"


Bing does a full crawl of the 150 page site each and every day getting 150 "200" responses, and also getting 40 to 50 "404" responses for non-existent file paths. Hasn't hurt SERP but getting tired of seeing all these "404" in logs.


Anyone else seeing this from Bing?

lucy24

1:11 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



It probably isn't a Bing issue as such. I mean, Bing didn't simply invent this directory out of its fevered robotic imagination ;) Now and then there are posts in the apache forums about google doing the same kind of thing-- or, for variety's sake, adding on a wholly imaginary query string.

Is there a place in Bing Webmaster Tools that says where they got the url from? Can't find one myself, but that's not to say it doesn't exist.

If 404 responses set your teeth on edge, you could always do a global redirect, simply eliminating the bogus directory. That's assuming it's some specific name, not a batch of 40 to 50 completely random ones. I hope not, because then either Bing does have a problem, or you're getting masses of links from someone you'd rather not get links from.

Come to think of it, it won't hurt to fine-tooth-comb your htaccess anyway. I just had a search engine get hit with a 404 on a .js file. After poring over every html file that could conceivably have made the mistake, I found that I'd misspelled its name in htaccess when redirecting from a previous url. Oops. Fortunately this kind of mistake only affects robots asking for the file "cold". In Bing's case you want to make sure it's walking in off the street and asking for the spurious url, not getting accidentally redirected from elsewhere on your site.

keyplyr

1:37 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes, the bogus directories are completely random. The only thing I can think causing it is:

1.) Some db type link directory somewhere presenting a crawlable index of all my pages, but screwing up the paths.

2.) My wonderful, glorious server farm (Gdaddy) some how only screwing up requests from Bingbot, and only some of the time.



Actually, I did just overhaul my .htaccess file. Feels like I lost 20 lbs. You were responsible for 1lb of the loss :)

lucy24

2:52 am on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



My wonderful, glorious server farm (Gdaddy) some how only screwing up requests from Bingbot, and only some of the time.

I wouldn't put it past them.

My own agribusiness, dreamhost, picked up a gremlin a while back as a byproduct of adding some security measures to log-file access. For about a month it was impossible to add new "Deny from {number}" lines except at the expense of having all your logs come out looking like Pfui's with resolved IPs instead of plain numbers.

When the gremlin got bored with that, it decided instead to make it impossible to create new mod_rewrite rules ending in [F]. Existing rules work fine, but if new rules are invoked, they go into infinite-redirect mode-- ending up with the desired 403, but only at the expense of an extra load on the server before it shuts off at 10 redirects.

Tech support spent a long time trying to figure it out and finally gave up on the grounds that it's not really causing me inconvenience. Translation: human time (theirs) costs more than server time (also theirs, since they don't charge for bandwidth) ;)