IIS flaw allows others to create dup content

Forum Moderators: open

Message Too Old, No Replies

IIS flaw allows others to create dup content

you could mess somebody up with this

ogletree

9:15 pm on Aug 6, 2007 (gmt 0)

If somebody is running a site on IIS you can create new URL's for them. You can also link to these new URL's and even submit them to a bunch of directories. It would create as many copies of the same pages as you like.

You can play with it on Microsoft's own site. Just add (A(abcdefg-1234567) between any two slashes. You can change it as much as you like to create lots of different URL's. The only rules are that the letter "A" has to be only one letter and it has to be a capital. The 2nd part can be just about anything. There are some characters like # or % that you can't use. You could get real fancy with it and add something like (I(id=1546973)). This could make Google think it is a session id.

[microsoft.com...]

I'm not sure what this is. I'm sure it is a "feature". Does anybody know what it is and how to turn off?

pageoneresults

1:56 pm on Aug 8, 2007 (gmt 0)

User-agent: *
Disallow: /(*

Personally I wouldn't use the robots.txt to control this. Anything you put in the robots.txt is going to become indexed with a URI only by Google. There is no need to let them know about the problem via robots.txt. It should be solved at the source and not with a bandaid workaround.

ogletree

2:10 pm on Aug 8, 2007 (gmt 0)

You have to be careful with robots.txt. I am very careful of what I put in there. You are announcing things you may not want people to know about. A website should be locked down. Nothing should be able to happen unless you want it to. You would not leave a side door open at your house why do it on your website.

StuWhite

2:17 pm on Aug 8, 2007 (gmt 0)

StuWhite
Did you test this entry robots text through the google webmaster tools to see if would disallow the indexing?

Yep, I tested it in webmaster console before implementing it.

The console also reports Googlebot attempts which are actually being blocked by the robots.txt so I know it is working.

There is no need to let them know about the problem via robots.txt

I don't think you need to classify it as a 'problem'. All you are doing is telling search bots not to index pages which include a session variable. I don't see it as advertising some huge security flaw in your system.

[edited by: StuWhite at 2:20 pm (utc) on Aug. 8, 2007]

blend27

1:48 am on Aug 9, 2007 (gmt 0)

so is it

Disallow: /(*

Disallow: */(*

Panic_Man

8:42 am on Aug 9, 2007 (gmt 0)

Both appear to work using the robots.txt checker in Google Webmaster Tools. However, the use of asterisks in this context is not valid syntax and therefore not widely supported.

User-agent: *
Disallow: /(

...is the valid (and presumably widely-supported) option I've chosen until something better arrives. There is apparently no need for a trailing *, owing to the way the robots syntax works.

Note that this is only a partial fix though as the bracketed part doesn't necessarily need to be right after the first backslash after the domain.

[edited by: Panic_Man at 8:53 am (utc) on Aug. 9, 2007]

pageoneresults

1:42 pm on Aug 9, 2007 (gmt 0)

How about Yahoo!, MSN and Ask, are they honoring the robots.txt file? Do they support the syntax provided? If not, where does that leave you?

blend27

2:17 pm on Aug 9, 2007 (gmt 0)

--where does that leave you? --

worried, very worried.

Maybe if SEs teams would provide an extra future for us in Sitemap Tools that would pretty much say: Optout from '(' in site URIs

Easy_Coder

8:21 pm on Aug 9, 2007 (gmt 0)

Or add a feature that let's us indicate to Index in...

- All Lower Case
- All Upper Case
- Mixed Case

Regardless of how you found the URL

[edited by: Easy_Coder at 8:21 pm (utc) on Aug. 9, 2007]

IanTurner

8:44 am on Aug 13, 2007 (gmt 0)

This appears to be some kind of bad behaviour inside the 404 handler.

In general the URLs with thedubious characters in are being handled as 404, but are returning a 200 in the headers.

Putting the following ASP code in the 404 Error Document will fix the behaviour.

***
Response.status = "404 File not found"
server.transfer "/404location.htm"
***

ogletree

2:04 pm on Aug 13, 2007 (gmt 0)

When I do this my 404 no longer works. I created a new file and called it 404.asp and set it up as my custom 404 page in IIS. The only thing in that file is

<%
Response.status = "404 File not found"
server.transfer "/404.htm"
%>

404.htm is the old custom 404 page I had that works fine when it is set up as the custom 404 in IIS. In IE I get a default 404 page and firefox trys to get me to down load a file. If I browse the file 404.asp it works as expected and brings up the contents in 404.htm.

IanTurner

2:34 pm on Aug 13, 2007 (gmt 0)

Which version of IIS are you using?

It is working fine for me in IIS6.0

(Are you sure you haven't got a typo in there somewhere?)

[edited by: IanTurner at 3:10 pm (utc) on Aug. 13, 2007]

ogletree

3:21 pm on Aug 13, 2007 (gmt 0)

IIS6. Like I said 404.asp works just fine. If I just browse to 404.asp it brings up the contents of 404.htm and when I check in a header checker it returns a 404 error. When I go into IIS I change the custom 404 from 404.htm to 404.asp.

When I check the 404 code with the header checker I get a 404 for each method. The problem is that the new method does not seem to work in a browser.

IanTurner

4:18 pm on Aug 13, 2007 (gmt 0)

Have you got the /404.asp in your custom 404 file properties.

If you miss the forward slash it treats the address as being relative to the current folder, rather than being relative to site root - given that the problem URLs are effectively slipping the folder structure down a level yoou should really look at making the files relative to the site root.

ogletree

6:17 pm on Aug 13, 2007 (gmt 0)

All I did was go into IIS properties to a custom 404 page that works just fine. I put the new file in the same directory that is working. All I did was go in and change htm to asp.

jenilia

9:47 am on Aug 21, 2007 (gmt 0)

In my system have a microsoft windows explorer-2004 it does not have a IIS but i want to run ASP file how to do. give me some valuable suggestions.

Panic_Man

8:39 am on Aug 28, 2007 (gmt 0)

I see .NET Framework v3.0 in my list of awaiting updates - maybe this will put an end to the problems?

sunpost

12:02 pm on Aug 28, 2007 (gmt 0)

No. From what I have been reading, v3.5 won't address this issue too.

ogletree

5:28 pm on Aug 28, 2007 (gmt 0)

One of the things I like to do to a website is enforce URL's. Every page is aware of itself and will not allow access to itself unless the proper url is used. Otherwise it will send out a 301 to the proper URL. Any page that is truly dynamic I put a noindex, nofollow on it. You should not have pages in google that are dependent on user interaction. Like search pages. Proper SEO requires this. The most important rule is never let googlebot guess. Lead it around like a horse. Never allow it to just roam on it's own.

g1smd

9:20 pm on Aug 28, 2007 (gmt 0)

This is junk:

Disallow: /(*

The * should be on the left as a wildcard.

The Disallow directive, matches URL that begin with whatever is disallowed.

So:

Disallow: /abcd

stops any URL that begins /abcd

and:

User-agent: Googlebot
Disallow: *(

disallows any URL that starts with anything and then includes a ( bracket and then continues with either anything or nothing else.

Wildcards should be aimed only at Googlebot.

ogletree

9:19 pm on Aug 31, 2007 (gmt 0)

Ok found out what is causing this. When I turn off cookies in firefox it generates that URL. I think my server is hading those URL's out to Google. The same thing does not happen in IE. Google must behave more like ff than ie.

Ocean10000

1:33 am on Sep 4, 2007 (gmt 0)

This is more of an FYI post do what you will with it.

If you want to turn off Cookieless State in Asp.net I have made example Configuration section to disable Cookieless Sessions. This will not stop all the unwanted behavior that has been noted. But it will stop the bots from getting new Cookieless urls handed to them automatically, does nothing for the old urls with the cookieless session keys in them.

Definition:
Cookieless. The cookieless option for ASP.NET is configured with this simple Boolean setting.

<configuration>
<system.web>
<sessionState cookieless="false" />
</system.web>
</configuration>

References:
ASP.NET Session State [msdn2.microsoft.com]
sessionState Element (ASP.NET Settings Schema) [msdn2.microsoft.com]
Underpinnings of the Session State Implementation in ASP.NET [msdn2.microsoft.com]

Drew_Black

6:56 pm on Sep 5, 2007 (gmt 0)

ogletree said:

One of the things I like to do to a website is enforce URL's. Every page is aware of itself and will not allow access to itself unless the proper url is used. Otherwise it will send out a 301 to the proper URL.

This is the best way to handle it and a lesson I learned the hard way a few years ago.

BTW, it's not just a case-sensitivity issue. You can rearrange the parameters in the querystring of a URL and most bots will look at each combination as a different page with the same content. ie. somepage.asp?a=1&b=2 will render the same page as somepage.asp?b=2&a=1 but the URLs are different and are therefor two different pages with the exact same content. In order to prevent this situation you need to inspect the URL and 301 to the correct one if the values are out of proper order. It's not trivial but fairly straightforward once you figure it all out.

ogletree

3:34 pm on Sep 17, 2007 (gmt 0)

One conclusion that we can get from this is that Googlebot spiders your site more like Firefox than IE. I think this "feature" of .NET is something that it shares only with IE. IE can read the code and creates a cookie with it FF does not. FF just puts it in the URL.

ogletree

5:59 pm on Sep 17, 2007 (gmt 0)

Never mind it does show up in IE. I had to delete my cookies and restart IE for it to show up. The URL does look a little different

(X(1)A(some_random_92_char_alphanumeric_code))

ogletree

4:06 pm on Sep 20, 2007 (gmt 0)

Ok it gets even weirder. I visited the site with cookies on but had my UA set to googlebot and the wierd url showed up.

This 55 message thread spans 2 pages: 55