Forum Moderators: open
You can play with it on Microsoft's own site. Just add (A(abcdefg-1234567) between any two slashes. You can change it as much as you like to create lots of different URL's. The only rules are that the letter "A" has to be only one letter and it has to be a capital. The 2nd part can be just about anything. There are some characters like # or % that you can't use. You could get real fancy with it and add something like (I(id=1546973)). This could make Google think it is a session id.
[microsoft.com...]
I'm not sure what this is. I'm sure it is a "feature". Does anybody know what it is and how to turn off?
This is all part of my "Sabotage a Competitor" series. What's to stop someone from finding and exploiting a flaw in your technical implementation to wreak havoc on your indexing? Nothing other than knowledge and experience on the technical side.
Hack that URI
Always, always, Hack that URI. With IIS, anytime you install an ISAPI filter, you break all of the default behaviors of the server (I believe this is a correct statement). You now need to accommodate for everything that could occur from your httpd.ini file. That means taking every URI and hacking it until you are certain that you've got all the proper server headers, content, etc. being served based on the request.
http://www.example.com/sub/sub/product/
http://www.example.com/sub/sub/product
http://www.example.com/sub/sub/produc
http://www.example.com/sub/sub/produ
http://www.example.com/sub/sub/prod
http://www.example.com/sub/sub/pro
http://www.example.com/sub/sub/pr
http://www.example.com/sub/sub/p
Etc.
What happens at each level of the URI? What server responses and content are being returned to the visitor? Can I interject something into the URI and have it resolve (return a 200 status). All sorts of security issues to deal with. And yes, I classify these as security issues.
Another one? Case sensitivity. On IIS, we are not case sensitive. So, in theory, and practice, you could probably wreak total havoc on someone's IIS hosted site. http://www.example.com/SUB/SUB/NeWfIlEnAmE.asp
I'd rather not talk about these things at the public level but I guess they need to be brought to the forefront every now and then. I'm sure there are a few running off right now to wreak some havoc somewhere just to test my theory. ;)
I'm not sure what this is. I'm sure it is a "feature". Does anybody know what it is and how to turn off?
I'm not either but it may be related to the use of ASP.NETs rewrite utilities. Those things are monsters!
mydomain.com/(A(asdf))/index.htm
Also you might notice that if you do it on a site that uses relative URL's this inserts the code in the URL's on the page. SO just one link can send google to spider your whole site with the new urls.
With IIS, anytime you install an ISAPI filter, you break all of the default behaviors of the server (I believe this is a correct statement)
It would also be accurate to note that you break the default behaviour of apache anytime you choose to install mod_whatever.
Well, so what? User installed code is always suspect and must always be tested.
The reason that the microsoft.com url string insertion works at the / level is that they have a topic redirection modification installed to allow users to hit the site with a product technology name for redirection purposes.
If the product phrase is not recognised, as in:
microsoft.com/(A(asdf))/index.htm
it redirects to a sitemap at /index.htm
Ever heard of a custom 404?
The example as given:
widgets.com/(A(asdf))/index.htm
gives a straight 404 as expected on iis servers that are known to me, where widgets.com is replaced with the real server name.
and they have *multiple* custom isapi filters installed.
SO just one link can send google to spider your whole site with the new urls.
And they'll all end in 404's.
Who knows, maybe google has a patent on:
"Reputation scoring of originating site via 404 counting"
Since, the claim as made does not work, except where the site has deliberately installed a redirection filter, I will score this as FUD, aka Fear, Uncertainty and Doubt.
[microsoft.com...]
It returns 200 ok. Also the reason I brought this up is that I have seen a site get indexed in Google with only these urls. That is a bad thing. These URL's are not on this site except in Google. It is true that some sites will handle it correctly but a lot don't. It is not FUD it is a legit issues that people need to know about and protect themselves.
I already told you that the microsoft.com site has a custom redirection capability installed, and has had it for years. The fact that it returns a 200 is entirely within that custom redirection facility.
A stock install of IIS does not do this.
Furthermore, I have custom 404's that return 200,301,302, and 500, depending on what *my* purpose is. That does not mean that IIS is broken. It is doing what I told it to do.
Therefore, I still say it is FUD.
The fact that a webmaster can completely BORK a web server, of any brand is not news. If it came time that most webmasters could do a pretty good job, now *that* would be news.
Don't hold your breath.
Although what you said about MS is not true. My url does not return a 404. It goes to the exact same page. There is no redirection. That is not on purpose. It is very common for IIS servers to be set up this way. My point that it should be taken care of is still valid. I am very big on setting a URL policy and enforcing it by not allowing any url that you don't know about to return a 200.
Since, the claim as made does not work, except where the site has deliberately installed a redirection filter.
Okay, but we are talking about a site that "has installed a redirection filter."
I've checked other IIS sites and that URI string works and returns a 200.
I already told you that the microsoft.com site has a custom redirection capability installed, and has had it for years. The fact that it returns a 200 is entirely within that custom redirection facility.
The fact that is returns a 200 is a problem. The status of that URI is not OK. It should be 404 Not Found. Now, why is MS returning a 200 for that URI string? And, why do other IIS sites do the same thing. Its a flaw as originally pointed out.
Good grief!
If you are not using .NET I would recommend uninstalling it. I'm not sure of the fix just yet. I am still testing and may get on the phone with MS today.
Well, this may turn into a huge mess before long. My sites all have this 'feature'.
I'm not too certain it is really something to worry about right now. I'm sure it will get noticed by MS and investigated.
If you are on a Windows Server and you've installed any sort of rewrite utility like ISAPI_Rewrite and haven't taken into account the fact that the default server behaviors are broken as soon as you implement a rewrite, you have more important issues to contend with.
Just the fact that we are case-insensitive on Windows really opens up a can of worms.
also works and returns a 200 I have seen this before a year or more ago as I as well found a site that was indexed with a string not like this so to speak but close. I asked matt on this and he replied that Google was aware of this and it would not cause isues with the algo.
I did google it works on them as well. I made a mistake it take 2 // for it to work on Google
Webmaster has it fixed on their site would anybody care to say how.
[edited by: Panic_Man at 9:28 pm (utc) on Aug. 7, 2007]
[widgets.com...]
to being redirected to something like:
[widgets.com...]
I first noticed when Google was reporting 10,000 of my pages in its index, even though my site only has about 1,000 pages.
The fix was to put this in my robots.txt:
User-agent: *
Disallow: /(*
Seemed to eventually fix the problem.
www.mydomain.com/abc/123/(A(abc))/default.htm
Would this be a valid robots.txt entry:
User-agent: *
Disallow: */(*
?
EDIT: Wait a minute! I just reread your post - How did you enable/disable cookieless sessions? Are they enabled by default?
Interesting read: [developerfusion.co.uk...]
[edited by: Panic_Man at 10:35 am (utc) on Aug. 8, 2007]
The fact that is returns a 200 is a problem. The status of that URI is not OK. It should be 404 Not Found. Now, why is MS returning a 200 for that URI string? And, why do other IIS sites do the same thing. Its a flaw as originally pointed out.
No, it is not a problem. It was a decision. I knew the test returned a 200. My original test was done with a sniffer inline. I was not ignoring it at all.
The point is, a site can return any status code that the designer decides will bring the desired result. If that status code is not what you think it should be, it is still a question of interpretation.
.NET framework, user installed code
isapi_rewrite, again, user installed code
A stock installation of IIS cannot and does not behave like this.
Wait a minute! I just reread your post - How did you enable/disable cookieless sessions? Are they enabled by default?
I asked matt on this and he replied that Google was aware of this and it would not cause isues with the algo.
If you are talking about the case insensitive problem with IIS that is a very old problem. IIS has been that way from the beginning.
I'm not too certain it is really something to worry about right now. I'm sure it will get noticed by MS and investigated.
I just tested my site and found that:
[mysite...] returns a 404
but
[mysite...]
takes me to the default page.
It would appear that my server is checking that the session variable is exactly 24 characters.
I know that this didn't used to be the case. Perhaps this has just come about in a recent update to the .Net framework...
a workaround might be to not allow cookieless sessions and test if the url does not match the rawurl. Then you can rewrite using a 404 or 301.
a workaround might be to not allow cookieless sessions and test if the url does not match the rawurl. Then you can rewrite using a 404 or 301.
I would have thought an extra line in your robots.txt would be an easier workaround. That fixes the problem of duplicate content in the SERPS.
Unless there is some other issue I haven't considered?
i did not test it, but adding the following line should keep asp.net from producing cookieless urls:
<sessionState mode="InProc" cookieless="false" timeout="20" />
only the asshats trying to sabotage a site will remain, but adding a 301 to those requests might help to minimize the damage.