homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
Bing is finding hidden folders on server
bing hidden folders
bingbing




msg:4500521
 9:09 am on Sep 27, 2012 (gmt 0)

Hi,

We have a hidden folder on our server, using a long and unguessable name. Bing has found this folder twice now and every time we change it, it re-finds it. The Bing bot is shown in the web server logs.

This folder is not in any file, not in a htaccess or in a sitemap, or anywhere. We have scanned the server for the string. It does not exist.

How can Bing find this folder? Is it possible that a process on the local PC is reporting URLs back to Bing? This seems the only way, and if so would be a serious prvacy breach.

Anyone got any comments?

 

lucy24




msg:4500805
 7:23 pm on Sep 27, 2012 (gmt 0)

Just to get the obvious out of the way: If you're on Apache, you haven't overridden the directory-slash redirect in combination with enabling auto-indexing have you?

Can we safely assume that nothing on the site or anywhere else uses any resources in your /fzzbwt/ directory? So its name does not occur within any document of any kind whatsoever, including server headers? Not only as <a or <link but anything of any kind?

If you're screaming Yes, you idiot, I checked all that already:

Good. This is the point where some people come slinking back "Oh, ###, it never entered my mind that bing could read {suchandsuch}." Or "I swear I checked for matching parentheses 87 times already" or equivalent.

Gotta check and double-check.

bingbing




msg:4500810
 7:32 pm on Sep 27, 2012 (gmt 0)

Hi Lucy,

Thanks for the reply. I have scanned all files on the server, and none reference the folder. We do have a rewrite config file that mentions the folder, but that is not stored in the web root. Also, Bing found various files *inside* the folder (which are definitely not mentioned anywhere) and even passed valid CGI parameters to them.

It is most suspicious and it really seems something is sending this information to Bing. Since only Microsoft IPs access the files, it does not seem to be a malicious user. Different Bing IPs access the files at around the same time of day.

We are not on Apache. This is a Windows box. Nothing in the headers either apart from what we'd expect to see a e.g. session cookie. Nothing in robots.txt file, site maps, etc etc (we scanned all files).

This is driving me mad since there's just no "leak" from our side of things.

Leosghost




msg:4500818
 7:55 pm on Sep 27, 2012 (gmt 0)

Are you running any kind of MS security system on your server..is there any "built in"..that would be indexing all folders and their contents..?

Rosalind




msg:4500819
 7:56 pm on Sep 27, 2012 (gmt 0)

There's a bot that follows UK TalkTalk users around wherever they go, usually a few seconds or minutes after they've visited a page, allegedly for virus and malware checking. Because it's done at the ISP level there's no hiding from it, no matter where you put your files.

Is the IP address really one of Bing's ranges?

bingbing




msg:4500823
 8:06 pm on Sep 27, 2012 (gmt 0)

@Leosghost, we run a AV scanner - ESET. That is all. How would someone get data from that though, and why MicroSoft?

@Rosalind, here are the IPs:

157.55.32.190
157.55.34.171
157.55.34.171
131.253.47.164
157.55.35.105
157.55.33.251
157.55.35.48
157.55.35.41

HTTP_USER_AGENT is Mozilla/5.0+(compatible;+bingbot/2.0;++http://www.bing.com/bingbot.htm)

bingbing




msg:4500848
 9:02 pm on Sep 27, 2012 (gmt 0)

I have added a password challenge response on the folder now, so this will stop it. Most strage...

phranque




msg:4500877
 10:37 pm on Sep 27, 2012 (gmt 0)

welcome to WebmasterWorld, bingbing!

any requests that supplies a 200 OK response is "available" and "hidden folders" are wishful thinking.

I have added a password challenge response on the folder now

i assume you mean basic authentication?

the 401 response is one solution.
you can also provide a 403 Forbidden response but allow requests from your IP address if that's an option.

bingbing




msg:4501021
 7:40 am on Sep 28, 2012 (gmt 0)

"hidden folders are wishful thinking."

I would not agree, a folder with 50 character name is never going to be guessed. This is malware or a process sending server browsing details back without our knowledge.

Lawsuit against MS pending now :)

lucy24




msg:4501033
 8:30 am on Sep 28, 2012 (gmt 0)

Bing didn't get there by guessing. Somewhere there is a link to either the folder or one of the files inside it.

Bing found various files *inside* the folder (which are definitely not mentioned anywhere) and even passed valid CGI parameters to them

It definitely didn't achieve that by guessing. Somewhere in the universe there exists a reference to the pages. If it were, ahem, a different search engine, I'd look more closely at gmail. Bing isn't furtively trying the same thing with hotmail are they? Or whatever mail it is they own this week :( Matter of fact it would almost have to be something that's only visible in Redmond, because otherwise the googlebot would be all over the pages too.

I don't suppose they're doing anything fantastically useful like listing the pages in bing wmt with details on Who Links Here. Nah, too easy.

bingbing




msg:4501036
 8:36 am on Sep 28, 2012 (gmt 0)

No links to any files. We even changed the folder name remember, and bing found it again in a week.

phranque




msg:4501061
 9:11 am on Sep 28, 2012 (gmt 0)

it doesn't really matter how the url was discovered.
"security through obscurity" is not a sound strategy.
if you want to forbid access or require credentials for access to a resource then you must send the appropriate response.

bing won't hear you whining when they accidentally stumble upon your secret path and get a 200 OK upon request.
=8)

bingbing




msg:4501068
 9:56 am on Sep 28, 2012 (gmt 0)

"it doesn't really matter how the url was discovered"

Not even if it was malware or a malicious act. That would matter to us.

"accidentally stumble"

As has been stated, Bing does not guess URLs so no "accidental stumbling" has occured either. Other posters above concur with this.

Microsoft are investigating and say they are "concerned". We know it is not an issue our end as all files have been scanned and these URLs do not exist anywhere publically (and they are not in Bing/Google index either).

There was no 200 OK response. They got 404 in fact, according to the server logs.

"security through obscurity"

The folder had a IP restriction, but not a 401/3, so there is no security breach. We just want to know how it finds the addresses in the first place.

No one's whining here Phranque =;)

phranque




msg:4501091
 10:54 am on Sep 28, 2012 (gmt 0)

i would not start with the assumption that bing uses malware or malice to discover unguessable urls.

bingbing




msg:4501094
 10:58 am on Sep 28, 2012 (gmt 0)

I didn't say it was malware from Microsoft. It might be a snooper on an office PC taking sensitive URLs. Strange why only Microsoft IPs are accessing the URLs though.

phranque




msg:4501096
 11:00 am on Sep 28, 2012 (gmt 0)

how would bing obtain those from a snooper?

lucy24




msg:4501102
 11:41 am on Sep 28, 2012 (gmt 0)

They got 404 in fact, according to the server logs.

? ? ? I thought the point was that they are finding pages that actually do exist. Or did you mean that your config file is nipping in with a quick 404 if the wrong person tries to get at Bluebeard's directory?

What is the directory really used for? Just random storage? Or does anything in there perform any function?

Speaking of logs, has anyone been snuffling around them? The log files themselves, that is. Feed the right search strings into g### and you'll find jaw-dropping numbers of raw logs out there on the web for anyone and everyone to paw through. And they're not all demos for "How to understand your logs" tutorials. I really don't want to think about how they all got out there.

bingbing




msg:4501105
 11:43 am on Sep 28, 2012 (gmt 0)

Exactly, this is what we are trying to ascertain. Bing crawls websites to build an index. If the URLs are not in the website, how can it find URLs and also post valid CGI parameters to them? We've changed that URL twice now, so that proves there is no link to the URL, otherwise we'd need to have changed the offending link too (and we did not).

bingbing




msg:4501118
 12:15 pm on Sep 28, 2012 (gmt 0)

Lucy, the IIS server logs and my own logging captured the visits. As you say, a custom IP check pushes the user to 404 if their IP is not permitted. We have a catch-all file in the folder that is loaded upon all page requests.

So what is happening is that Bing is requesting valid URLs, which we still don't know how they are getting, and is then pushed a 404.

I'm just trying to find out how it knows parts of a file system that are not published anywhere. Also, Bing has requested totally different files, which are not even related (linked) to some of the URLs in question, so it has not even spidered these in order to find them. It's like Bing is psychic lol.

phranque




msg:4501120
 12:26 pm on Sep 28, 2012 (gmt 0)

do you link to awstats or similar anywhere?
does your host provide links to any type of log file analysis results?

bingbing




msg:4501123
 12:31 pm on Sep 28, 2012 (gmt 0)

No phranque, we use Google analytics, but not for any files in this special folder as they are not public.

lucy24




msg:4501421
 2:03 am on Sep 29, 2012 (gmt 0)

Bing crawls websites to build an index. If the URLs are not in the website, how can it find URLs and also post valid CGI parameters to them?

parts of a file system that are not published anywhere

"not published on my own site" is not the same as "not published anywhere".

Somewhere earlier in this thread you said that bing knows the names of pages in the directory. If so, the directory itself becomes irrelevant. Once you've got example.com/directory/secretdirectory/filename.html you don't need a separate link to know that /directory/secretdirectory/ exists. Robots don't have a lot of brains, but they-- or, ahem, their human programmers-- can figure out that much. In my own logs, I routinely see bing asking for /ebooks/sometitle/ although these files (that is, an /ebooks/title/index.html file for the assorted titles) don't exist. At one time I assumed it was my fault for ::cough-cough:: goofing on some relative links. But now I realize they'd be asking for these files anyway. Heck, even humans will sometimes try it. Same goes for other directories that don't have an index file* -- and they're not all attributable to mistakes I made in the past.


* <ot>In cases where an index.html file seems a reasonable thing to look for, I've put in individual redirects.</ot>

bingbing




msg:4501482
 7:11 am on Sep 29, 2012 (gmt 0)

So you are saying that Bing does "guess" URLs?

I know the directory is irrelevant, once found, but finding all sorts of template names that we have never been published is very suspicious. How did it make the leap to filenames that are quite long and look_like_this.ext etc and also even pass valid CGI params?

Perhaps, as you say, someone has posted some links publically somewhere? But why is only Microsoft calling those links in that case?

Jonesy




msg:4501659
 4:57 pm on Sep 29, 2012 (gmt 0)

I had such a problem some time back.
It was (eventually) easy to trace due to the fact that one
other fella (several states away) and I were working/testing
a PHP script in a "hidden", unguessable directory.
He had a (Alexa? -- pretty sure) toolbar installed in his browser.
His browsing activity leaked out that way.

bingbing




msg:4501661
 5:07 pm on Sep 29, 2012 (gmt 0)

Hi Jonesy, that sounds very interesting and I'm sure this is the sort of thing that has happened here. Our "secret" folder has been secret for 12 years this month, so this looks to be the sort of thing that has happened, but I'll await Microsoft's explanation.

[imilly.com...] covers the issue you mention, and even documents data being sent to MSN. Very stealthy indeed, almost bordering on spyware in my book.

bingbing




msg:4501764
 10:11 pm on Sep 29, 2012 (gmt 0)

This looks to be the issue. Bing (and some other browser toolbars) report back URLs:

[convonix.com...]

Leosghost




msg:4501769
 10:21 pm on Sep 29, 2012 (gmt 0)

Why do people ( especially anyone working on sites/pages that they don't want public or indexed yet use search toolbars anyway ) ? each search engine has it's own URL, why don't people just type them in the address bar..

First rule of anyone or any team working on a site/pages..absolutely no toolbars..with the exception of the firefox web developer toolbar..If this was your "problem" ?, I'm amazed you didn't insist on no search engine, shopping, alexa or other crap toolbars from the beginning..

bingbing




msg:4502010
 2:43 pm on Sep 30, 2012 (gmt 0)

@Leosghost, I (the developer) have no toolbars installed. The toolbar is installed in an office that uses the software via the web.

Some people install toolbars without knowing. Many toolbars come bundled with other software downloads. In this case the office did not even know that the toolbar was installed. It actually came pre-installed on a office PC.

Toolbars that do this so-called useful "URL discovery" are spyware plain and simple. Only very recently have mainstream articles appeared covering this stealthy behaviour of toolbars such as Alexa and Bing.

Microsoft have now confirmed that BingBar is the issue in our case, and it was reporting URLs back to MSN. Having read the privacy statements of the toolbar in question, it does not even mention "URL discovery".

More seriously, it does not matter if you password protect your folders or not, since Bing will still try and contact the URLs. In our case the accesses generated 404s because we had an IP check in place, but that's not the issue. The issue is that Bing was monitoring, in real time, our browser habits and reporting it back to see if it could gain access.

Read this (http://privacy.microsoft.com/en-us/bing.mspx#EZ) and try and find where it tells the user that the toolbar will not just use your searches in Bing, but will actually monitor all the URLs you enter and send them back to Microsoft central for possible search engine inclusion. They can't seem to answer this...

phranque




msg:4502266
 5:10 am on Oct 1, 2012 (gmt 0)

you mean this part?

http://privacy.microsoft.com/en-us/bing.mspx#ESC
Bing Bar Experience Improvement Program. If you’ve chosen to participate in the Bing Bar Experience Improvement program, information about the websites that you visit and how you use Bing Bar will be sent to Microsoft. This information helps us to improve the quality and performance of our products and services and to help prevent spam and phishing scams from reaching your inbox. While the Bing Bar is enabled, the information about the websites accessed by the browser Bing Bar is installed on is collected to improve search ranking and relevance and the performance of Bing services.

bingbing




msg:4502343
 6:33 am on Oct 1, 2012 (gmt 0)

Phranque,

When Bingbar is installed it has a search bar. That search bar should be the focus of the data collection, but it is not. Instead, Bing just monitors all browsing history, collects URLs and sends them back to MSN. We never used the search bar to search for our own website.

Also, we checked with Microsoft and we are not signed up for the "Bing Bar experience program".

Thanks for all your replies and to Jonesy for actually finding out what the problem was.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved