This can actually happen to anyone, not just IIS. Although those of us on Windows seem to have more bases to cover when we get into the technical aspects of things.
This is all part of my "Sabotage a Competitor" series. What's to stop someone from finding and exploiting a flaw in your technical implementation to wreak havoc on your indexing? Nothing other than knowledge and experience on the technical side.
Hack that URI
Always, always, Hack that URI. With IIS, anytime you install an ISAPI filter, you break all of the default behaviors of the server (I believe this is a correct statement). You now need to accommodate for everything that could occur from your httpd.ini file. That means taking every URI and hacking it until you are certain that you've got all the proper server headers, content, etc. being served based on the request.
What happens at each level of the URI? What server responses and content are being returned to the visitor? Can I interject something into the URI and have it resolve (return a 200 status). All sorts of security issues to deal with. And yes, I classify these as security issues.
Another one? Case sensitivity. On IIS, we are not case sensitive. So, in theory, and practice, you could probably wreak total havoc on someone's IIS hosted site. http://www.example.com/SUB/SUB/NeWfIlEnAmE.asp
I'd rather not talk about these things at the public level but I guess they need to be brought to the forefront every now and then. I'm sure there are a few running off right now to wreak some havoc somewhere just to test my theory. ;)
|I'm not sure what this is. I'm sure it is a "feature". Does anybody know what it is and how to turn off? |
I'm not either but it may be related to the use of ASP.NETs rewrite utilities. Those things are monsters!
I tried this on an IIS site that was not using any server side language and it worked.
Also you might notice that if you do it on a site that uses relative URL's this inserts the code in the URL's on the page. SO just one link can send google to spider your whole site with the new urls.
Yuck, what the heck is that? How did you find that?
A site that I was doing an evaluation for is completely indexed in google with urls like that. We have no idea how it happened. We can't make our site create those type of url's and have no idea how google found them that way.
|With IIS, anytime you install an ISAPI filter, you break all of the default behaviors of the server (I believe this is a correct statement) |
It would also be accurate to note that you break the default behaviour of apache anytime you choose to install mod_whatever.
Well, so what? User installed code is always suspect and must always be tested.
The reason that the microsoft.com url string insertion works at the / level is that they have a topic redirection modification installed to allow users to hit the site with a product technology name for redirection purposes.
If the product phrase is not recognised, as in:
it redirects to a sitemap at /index.htm
Ever heard of a custom 404?
The example as given:
gives a straight 404 as expected on iis servers that are known to me, where widgets.com is replaced with the real server name.
and they have *multiple* custom isapi filters installed.
|SO just one link can send google to spider your whole site with the new urls. |
And they'll all end in 404's.
Who knows, maybe google has a patent on:
"Reputation scoring of originating site via 404 counting"
Since, the claim as made does not work, except where the site has deliberately installed a redirection filter, I will score this as FUD, aka Fear, Uncertainty and Doubt.
What are you talking about. Check the header on
It returns 200 ok. Also the reason I brought this up is that I have seen a site get indexed in Google with only these urls. That is a bad thing. These URL's are not on this site except in Google. It is true that some sites will handle it correctly but a lot don't. It is not FUD it is a legit issues that people need to know about and protect themselves.
I already told you that the microsoft.com site has a custom redirection capability installed, and has had it for years. The fact that it returns a 200 is entirely within that custom redirection facility.
A stock install of IIS does not do this.
Furthermore, I have custom 404's that return 200,301,302, and 500, depending on what *my* purpose is. That does not mean that IIS is broken. It is doing what I told it to do.
Therefore, I still say it is FUD.
The fact that a webmaster can completely BORK a web server, of any brand is not news. If it came time that most webmasters could do a pretty good job, now *that* would be news.
Don't hold your breath.
I understand your point. I appreciate you pointing out what this is.
Although what you said about MS is not true. My url does not return a 404. It goes to the exact same page. There is no redirection. That is not on purpose. It is very common for IIS servers to be set up this way. My point that it should be taken care of is still valid. I am very big on setting a URL policy and enforcing it by not allowing any url that you don't know about to return a 200.
|Since, the claim as made does not work, except where the site has deliberately installed a redirection filter. |
Okay, but we are talking about a site that "has installed a redirection filter."
I've checked other IIS sites and that URI string works and returns a 200.
|I already told you that the microsoft.com site has a custom redirection capability installed, and has had it for years. The fact that it returns a 200 is entirely within that custom redirection facility. |
The fact that is returns a 200 is a problem. The status of that URI is not OK. It should be 404 Not Found. Now, why is MS returning a 200 for that URI string? And, why do other IIS sites do the same thing. Its a flaw as originally pointed out.
Ok I did more testing. I should have done this in the first place. plumsauce you are completely wrong. This happens if you just install the .NET framework 2.0. I have a brand new windows 2003 server with SP1 and default IIS. I tried the url thing with default IIS and it did not work. I installed .NET framework and it now works. I tried with it set to 1.1.4322 and 2.0.50727. There is no mod installed. The .NET framework is not a mod it is a part of IIS. Microsoft wants you to install it.
If you are not using .NET I would recommend uninstalling it. I'm not sure of the fix just yet. I am still testing and may get on the phone with MS today.
Seems like something gets hosed with the httpModule for cookieless sessionState. It might be worthwhile sending it off to MS.
This similar problem I posted few days ago and you wont believe it - I contacted MS as well and infact they are not having any rigid solution to it..
If you are talking about the case insensitive problem with IIS that is a very old problem. IIS has been that way from the beginning.
One thing I like to put on my pages is code that detects the URL used and if there is anything wrong with it I correct the url and 301 it to the correct one.
In reference to the Case Sensitivity issues. If you have ISAPI_Rewrite installed...
#convert all upper case to lower
RewriteCond URL ([^?]+[[:upper:]][^?]*).*
RewriteHeader X-LowerCase-URI: .* $1 [CL]
RewriteCond Host: (.+)
RewriteCond X-LowerCase-URI: (.+)
RewriteRule [^?]+(.*) http\://$1$2$3 [I,RP]
this just unmade my day
Well, this may turn into a huge mess before long. My sites all have this 'feature'.
|Well, this may turn into a huge mess before long. My sites all have this 'feature'. |
I'm not too certain it is really something to worry about right now. I'm sure it will get noticed by MS and investigated.
If you are on a Windows Server and you've installed any sort of rewrite utility like ISAPI_Rewrite and haven't taken into account the fact that the default server behaviors are broken as soon as you implement a rewrite, you have more important issues to contend with.
Just the fact that we are case-insensitive on Windows really opens up a can of worms.
also works and returns a 200 I have seen this before a year or more ago as I as well found a site that was indexed with a string not like this so to speak but close. I asked matt on this and he replied that Google was aware of this and it would not cause isues with the algo.
I did google it works on them as well. I made a mistake it take 2 // for it to work on Google
Webmaster has it fixed on their site would anybody care to say how.
We have two IIS servers and can confirm previous suspicions that the unaffected server runs .NET framework 1 and the other server, which suffers from this bug, runs .NET framework 2. Unfortunately, we are not able to uninstall .NET framework 2 on the afflicted server any time soon.
[edited by: Panic_Man at 9:28 pm (utc) on Aug. 7, 2007]
-- case-insensitive on Windows --
tanking your "friends" works like a charm, i am one of them. did it to my self too.
Lets take it to the G.Y.M, what do they have to say?
I've had this problem for some time and it started when I briefly enabled cookieless sessions on IIS, which causes URLs like this:
to being redirected to something like:
I first noticed when Google was reporting 10,000 of my pages in its index, even though my site only has about 1,000 pages.
The fix was to put this in my robots.txt:
Seemed to eventually fix the problem.
That's encouraging. However, IIRC, the parenthesis could appear anywhere after the first / after the domain, e.g.
Would this be a valid robots.txt entry:
EDIT: Wait a minute! I just reread your post - How did you enable/disable cookieless sessions? Are they enabled by default?
Interesting read: [developerfusion.co.uk...]
[edited by: Panic_Man at 10:35 am (utc) on Aug. 8, 2007]
|The fact that is returns a 200 is a problem. The status of that URI is not OK. It should be 404 Not Found. Now, why is MS returning a 200 for that URI string? And, why do other IIS sites do the same thing. Its a flaw as originally pointed out. |
No, it is not a problem. It was a decision. I knew the test returned a 200. My original test was done with a sniffer inline. I was not ignoring it at all.
The point is, a site can return any status code that the designer decides will bring the desired result. If that status code is not what you think it should be, it is still a question of interpretation.
.NET framework, user installed code
isapi_rewrite, again, user installed code
A stock installation of IIS cannot and does not behave like this.
|Wait a minute! I just reread your post - How did you enable/disable cookieless sessions? Are they enabled by default? |
No, they aren't enabled by default. I can't quite remember how I did it initially.
|I asked matt on this and he replied that Google was aware of this and it would not cause isues with the algo. |
Well they may be aware of it but that doesn't seem to stop them spidering the one page dozens of times a day using different session variables and including them all in their index as different pages. MSN and Y! do the same.
|If you are talking about the case insensitive problem with IIS that is a very old problem. IIS has been that way from the beginning. |
Don't you mean the case sensitivity problem with Apache? :-)
|I'm not too certain it is really something to worry about right now. I'm sure it will get noticed by MS and investigated. |
I just tested my site and found that:
[mysite...] returns a 404
takes me to the default page.
It would appear that my server is checking that the session variable is exactly 24 characters.
I know that this didn't used to be the case. Perhaps this has just come about in a recent update to the .Net framework...
.NET is not user installed code any more than MS Word or Service Packs or 64 bit version of Windows Server or IE7. MS wants you to install that. .NET is part of IIS. Anything that can be installed using MS update is not user installed code.
The ISAPI filter aspnet_filter.dll appears to be the culprit. Even if the sessionState is removed, it still rewrites a header that matches the patterns above. The problem is that you can't remove that filter or the protection of certain folders(/bin, /app_data, /app_code) is removed too.
a workaround might be to not allow cookieless sessions and test if the url does not match the rawurl. Then you can rewrite using a 404 or 301.
|a workaround might be to not allow cookieless sessions and test if the url does not match the rawurl. Then you can rewrite using a 404 or 301. |
I would have thought an extra line in your robots.txt would be an easier workaround. That fixes the problem of duplicate content in the SERPS.
Unless there is some other issue I haven't considered?
Did you test this entry robots text through the google webmaster tools to see if would disallow the indexing?
I forgot webmasterworld was on an apache server "Du" and this issue is confined to windows servers.
The robots.txt combined with disabling cookieless sessions might work to keep the spiders out. if you don't set cookieless to false, I think asp.net will be more than happy to start a new cookieless session for googlebot with a munged url.
i did not test it, but adding the following line should keep asp.net from producing cookieless urls:
<sessionState mode="InProc" cookieless="false" timeout="20" />
only the asshats trying to sabotage a site will remain, but adding a 301 to those requests might help to minimize the damage.
| This 55 message thread spans 2 pages: 55 (  2 ) > > |