homepage Welcome to WebmasterWorld Guest from 54.227.215.139
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
Forum Library, Charter, Moderators: ocean10000

Microsoft IIS Web Server and ASP.NET Forum

This 55 message thread spans 2 pages: < < 55 ( 1 [2]     
IIS flaw allows others to create dup content
you could mess somebody up with this
ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 9:15 pm on Aug 6, 2007 (gmt 0)

If somebody is running a site on IIS you can create new URL's for them. You can also link to these new URL's and even submit them to a bunch of directories. It would create as many copies of the same pages as you like.

You can play with it on Microsoft's own site. Just add (A(abcdefg-1234567) between any two slashes. You can change it as much as you like to create lots of different URL's. The only rules are that the letter "A" has to be only one letter and it has to be a capital. The 2nd part can be just about anything. There are some characters like # or % that you can't use. You could get real fancy with it and add something like (I(id=1546973)). This could make Google think it is a session id.

[microsoft.com...]

I'm not sure what this is. I'm sure it is a "feature". Does anybody know what it is and how to turn off?

 

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 1:56 pm on Aug 8, 2007 (gmt 0)

User-agent: *
Disallow: /(*

Personally I wouldn't use the robots.txt to control this. Anything you put in the robots.txt is going to become indexed with a URI only by Google. There is no need to let them know about the problem via robots.txt. It should be solved at the source and not with a bandaid workaround.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 2:10 pm on Aug 8, 2007 (gmt 0)

You have to be careful with robots.txt. I am very careful of what I put in there. You are announcing things you may not want people to know about. A website should be locked down. Nothing should be able to happen unless you want it to. You would not leave a side door open at your house why do it on your website.

StuWhite

5+ Year Member



 
Msg#: 3415232 posted 2:17 pm on Aug 8, 2007 (gmt 0)

StuWhite
Did you test this entry robots text through the google webmaster tools to see if would disallow the indexing?

Yep, I tested it in webmaster console before implementing it.

The console also reports Googlebot attempts which are actually being blocked by the robots.txt so I know it is working.

There is no need to let them know about the problem via robots.txt

I don't think you need to classify it as a 'problem'. All you are doing is telling search bots not to index pages which include a session variable. I don't see it as advertising some huge security flaw in your system.

[edited by: StuWhite at 2:20 pm (utc) on Aug. 8, 2007]

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3415232 posted 1:48 am on Aug 9, 2007 (gmt 0)

so is it

Disallow: /(*

or

Disallow: */(*

?

Panic_Man

10+ Year Member



 
Msg#: 3415232 posted 8:42 am on Aug 9, 2007 (gmt 0)

Both appear to work using the robots.txt checker in Google Webmaster Tools. However, the use of asterisks in this context is not valid syntax and therefore not widely supported.

User-agent: *
Disallow: /(

...is the valid (and presumably widely-supported) option I've chosen until something better arrives. There is apparently no need for a trailing *, owing to the way the robots syntax works.

Note that this is only a partial fix though as the bracketed part doesn't necessarily need to be right after the first backslash after the domain.

[edited by: Panic_Man at 8:53 am (utc) on Aug. 9, 2007]

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 1:42 pm on Aug 9, 2007 (gmt 0)

How about Yahoo!, MSN and Ask, are they honoring the robots.txt file? Do they support the syntax provided? If not, where does that leave you?

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3415232 posted 2:17 pm on Aug 9, 2007 (gmt 0)

--where does that leave you? --

worried, very worried.

Maybe if SEs teams would provide an extra future for us in Sitemap Tools that would pretty much say: Optout from '(' in site URIs

Easy_Coder

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3415232 posted 8:21 pm on Aug 9, 2007 (gmt 0)

Or add a feature that let's us indicate to Index in...

- All Lower Case
- All Upper Case
- Mixed Case

Regardless of how you found the URL

[edited by: Easy_Coder at 8:21 pm (utc) on Aug. 9, 2007]

IanTurner

WebmasterWorld Administrator ianturner us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 8:44 am on Aug 13, 2007 (gmt 0)

This appears to be some kind of bad behaviour inside the 404 handler.

In general the URLs with thedubious characters in are being handled as 404, but are returning a 200 in the headers.

Putting the following ASP code in the 404 Error Document will fix the behaviour.

***
Response.status = "404 File not found"
server.transfer "/404location.htm"
***

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 2:04 pm on Aug 13, 2007 (gmt 0)

When I do this my 404 no longer works. I created a new file and called it 404.asp and set it up as my custom 404 page in IIS. The only thing in that file is

<%
Response.status = "404 File not found"
server.transfer "/404.htm"
%>

404.htm is the old custom 404 page I had that works fine when it is set up as the custom 404 in IIS. In IE I get a default 404 page and firefox trys to get me to down load a file. If I browse the file 404.asp it works as expected and brings up the contents in 404.htm.

IanTurner

WebmasterWorld Administrator ianturner us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 2:34 pm on Aug 13, 2007 (gmt 0)

Which version of IIS are you using?

It is working fine for me in IIS6.0

(Are you sure you haven't got a typo in there somewhere?)

[edited by: IanTurner at 3:10 pm (utc) on Aug. 13, 2007]

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 3:21 pm on Aug 13, 2007 (gmt 0)

IIS6. Like I said 404.asp works just fine. If I just browse to 404.asp it brings up the contents of 404.htm and when I check in a header checker it returns a 404 error. When I go into IIS I change the custom 404 from 404.htm to 404.asp.

When I check the 404 code with the header checker I get a 404 for each method. The problem is that the new method does not seem to work in a browser.

IanTurner

WebmasterWorld Administrator ianturner us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 4:18 pm on Aug 13, 2007 (gmt 0)

Have you got the /404.asp in your custom 404 file properties.

If you miss the forward slash it treats the address as being relative to the current folder, rather than being relative to site root - given that the problem URLs are effectively slipping the folder structure down a level yoou should really look at making the files relative to the site root.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 6:17 pm on Aug 13, 2007 (gmt 0)

All I did was go into IIS properties to a custom 404 page that works just fine. I put the new file in the same directory that is working. All I did was go in and change htm to asp.

jenilia

5+ Year Member



 
Msg#: 3415232 posted 9:47 am on Aug 21, 2007 (gmt 0)

In my system have a microsoft windows explorer-2004 it does not have a IIS but i want to run ASP file how to do. give me some valuable suggestions.

Panic_Man

10+ Year Member



 
Msg#: 3415232 posted 8:39 am on Aug 28, 2007 (gmt 0)

I see .NET Framework v3.0 in my list of awaiting updates - maybe this will put an end to the problems?

sunpost

5+ Year Member



 
Msg#: 3415232 posted 12:02 pm on Aug 28, 2007 (gmt 0)

No. From what I have been reading, v3.5 won't address this issue too.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 5:28 pm on Aug 28, 2007 (gmt 0)

One of the things I like to do to a website is enforce URL's. Every page is aware of itself and will not allow access to itself unless the proper url is used. Otherwise it will send out a 301 to the proper URL. Any page that is truly dynamic I put a noindex, nofollow on it. You should not have pages in google that are dependent on user interaction. Like search pages. Proper SEO requires this. The most important rule is never let googlebot guess. Lead it around like a horse. Never allow it to just roam on it's own.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 9:20 pm on Aug 28, 2007 (gmt 0)

This is junk:

Disallow: /(*

The * should be on the left as a wildcard.

.

The Disallow directive, matches URL that begin with whatever is disallowed.

So:

Disallow: /abcd

stops any URL that begins /abcd

and:

User-agent: Googlebot
Disallow: *(

disallows any URL that starts with anything and then includes a ( bracket and then continues with either anything or nothing else.

Wildcards should be aimed only at Googlebot.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 9:19 pm on Aug 31, 2007 (gmt 0)

Ok found out what is causing this. When I turn off cookies in firefox it generates that URL. I think my server is hading those URL's out to Google. The same thing does not happen in IE. Google must behave more like ff than ie.

Ocean10000

WebmasterWorld Administrator 10+ Year Member



 
Msg#: 3415232 posted 1:33 am on Sep 4, 2007 (gmt 0)

This is more of an FYI post do what you will with it.

If you want to turn off Cookieless State in Asp.net I have made example Configuration section to disable Cookieless Sessions. This will not stop all the unwanted behavior that has been noted. But it will stop the bots from getting new Cookieless urls handed to them automatically, does nothing for the old urls with the cookieless session keys in them.

Definition:
Cookieless. The cookieless option for ASP.NET is configured with this simple Boolean setting.

<configuration>
<system.web>
<sessionState cookieless="false" />
</system.web>
</configuration>

References:
ASP.NET Session State [msdn2.microsoft.com]
sessionState Element (ASP.NET Settings Schema) [msdn2.microsoft.com]
Underpinnings of the Session State Implementation in ASP.NET [msdn2.microsoft.com]

Drew_Black

10+ Year Member



 
Msg#: 3415232 posted 6:56 pm on Sep 5, 2007 (gmt 0)

ogletree said:
One of the things I like to do to a website is enforce URL's. Every page is aware of itself and will not allow access to itself unless the proper url is used. Otherwise it will send out a 301 to the proper URL.

This is the best way to handle it and a lesson I learned the hard way a few years ago.

BTW, it's not just a case-sensitivity issue. You can rearrange the parameters in the querystring of a URL and most bots will look at each combination as a different page with the same content. ie. somepage.asp?a=1&b=2 will render the same page as somepage.asp?b=2&a=1 but the URLs are different and are therefor two different pages with the exact same content. In order to prevent this situation you need to inspect the URL and 301 to the correct one if the values are out of proper order. It's not trivial but fairly straightforward once you figure it all out.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 3:34 pm on Sep 17, 2007 (gmt 0)

One conclusion that we can get from this is that Googlebot spiders your site more like Firefox than IE. I think this "feature" of .NET is something that it shares only with IE. IE can read the code and creates a cookie with it FF does not. FF just puts it in the URL.

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 5:59 pm on Sep 17, 2007 (gmt 0)

Never mind it does show up in IE. I had to delete my cookies and restart IE for it to show up. The URL does look a little different

(X(1)A(some_random_92_char_alphanumeric_code))

ogletree

WebmasterWorld Senior Member ogletree us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3415232 posted 4:06 pm on Sep 20, 2007 (gmt 0)

Ok it gets even weirder. I visited the site with cookies on but had my UA set to googlebot and the wierd url showed up.

This 55 message thread spans 2 pages: < < 55 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Microsoft IIS Web Server and ASP.NET
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved