Forum Moderators: open
http://example.com/directory/file.html
That page has links to:
/
/directory/otherfile.html
/page.html
All the links on the page start with a slash.
Clients that use Internet Explorer end up frequently requesting
http://example.com/directory/
with a referrer of
http://example.com/directory/file.html
despite the fact that there are no links to the directory on the page.
We don't have an index page, and for technical reasons, it is is difficult for us to create one. So, IE users end up getting lots of 403 forbidden errors.
Yahoo slurp also requests http://example.com/directory/ but safari, firefox, googlbot, msnbot, and other user agents *never* seem to fetch the directory url.
What could IE and Slurp possibly be seeing on these pages as a link to the directory that no other user agent sees? Has anybody run across anything like this before?
We don't have an index page, and for technical reasons, it is is difficult for us to create one. So, IE users end up getting lots of 403 forbidden errors.
Can you set up a redirect so the users end up at a valid page instead?
Yahoo slurp also requests http://example.com/directory/ but safari, firefox, googlbot, msnbot, and other user agents *never* seem to fetch the directory url.
There must be a link somewhere. Or a Toolbar that is phoning home with someone's dev movements? Who knows...
Its also possible that someone is hacking the URI. Land here http://example.com/directory/file.html and then trim back to here http://example.com/directory/ to see what's there. Depends on your audience and if they are nosy or not. I do it quite frequently but I'm probably not the average site visitor either. And when I do it, my Google Toolbar is active so its sending information. ;)
People that explore the url by hand don't end sending a referrer string, so I don't think it is that either. Also, about 10% of IE users get to the directory level, which is a very high percent for url exploration.
The toolbar theory is a good one, but I'm not sure what what toolbar that would be or what it would be trying to do. If there is such a beast, I would like to somehow prevent it from doing that.
I would think there was a link too, but I've search the page source, and clicked on every link on the page. If there were a link I would expect that firefox and safari users would also follow it.
Ah, is there an external link though? Did someone maybe link to that page by accident? Or purposely? Do a site: search for that particular URI and see if the SE's have the reference indexed.
You will probably find, after looking at your raw server logs, that these are HTTP OPTIONS and PROPFIND requests for the directory "page," rather than GET requests. These are often accompanied by requests such as "GET /_vti_bin/owssvr.dll" with an IE user-agent, and "GET /_vti_inf.html" and "POST /_vti_bin/shtml.exe/_vti_rpc" with a FrontPage user-agent.
Because you have no directory index page, and your server likely has "Options -Indexes" set (or equivalent for IIS), the server responds with a 403 for the index requests.
The non-Microsoft browsers have no such integration with Office, so you won't see this problem with them.
Yahoo? What can I say? Their 'digging' for unlinked directory index pages is just annoying.
Jim
Something like this:
124.#*$!.#*$!.#*$! - - [16/May/2008:05:25:54 -0400] "GET /directory/ HTTP/1.1" 403 347 "http://www.example.com/directory/file.html" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 3.1)"
However, I do also see the _vti_ requests from the same IP addresses. So I think that your MS Office suite suggestion is correct.
And Yahoo may just be digging, they never send a referrer.
Thanks!
Is there a way to identify these background requests? Are there special headers that get sent for example?