domain vs. domain/default.asp - a Windows server problem - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

domain vs. domain/default.asp - a Windows server problem

OutdoorMan

9:43 pm on Oct 3, 2006 (gmt 0)

Hi.

Today I've sadly discovered that Google has indexed my front page both as:

http://www.example.com/ and http://www.example.com/default.asp

I have the www vs. non-www issue covered by an ASP-code I've found here: [webmasterworld.com...]

But how in the ... do I also prevent the SE's from indexing my pages as both www.example.com/ and www.example.com/default.asp?

I really hope, someone has a usefull ASP-solution to prevent this issue.

[edited by: tedster at 10:55 pm (utc) on Oct. 3, 2006]
[edit reason] use example.com [/edit]

g1smd

11:57 pm on Oct 3, 2006 (gmt 0)

You'll need the default.asp URL to return either a "404" status or else return a "301" to the root index page. This is much harder to do using IIS than it is using Apache [webmasterworld.com]. There is much that can go wrong.

OutdoorMan

12:07 am on Oct 4, 2006 (gmt 0)

How about if I change the /default.asp to /index.asp - that should provide a true 404 on /default.asp and leave the / intact, right?

Or do you know a better way (using ASP)?

OutdoorMan

12:09 am on Oct 4, 2006 (gmt 0)

My concern also is that I have seen my index page's first toolbar PR (ever) today (only in the cache-version of both / and /default.asp).

What will happen with the PR if I rename the /default.asp to /index.asp? Will it dissapear?

tedster

12:22 am on Oct 4, 2006 (gmt 0)

If you rename the file, you may see a fix for now -- but the problem may also reappear. Have you understood and fixed the root cause? -- especially, have you fixed all links that used to point to default.asp so they now point to the domain?

The issue here that I have not been able to resolve on a Windows server so far, is that any attempted redirect method I try on Windows seems to create an infinite loop. There is an approach for index.html redirect on Apache [webmasterworld.com] that avoids the redirect loop, and I am assuming that something can also be done on Windows, but I'm not clever enough, so far, to come up with it.

That said, I am also not currently noticing any problems on any of the Windows domains I work with. This doesn't mean, for example, that there isn't a low-level "split PR issue" or something like that -- but I don't see signs of it in the SERPs or in actual search traffic.

I have had the sense recently that Google may be working to fix and avoid this particular type of "duplicate" url issue behind the scenes. It is extremely common, but I can only hope! Meanwhile I keep studying up on IIS, ASP, .NET and VBscript -- looking to find an answer that I can be responsible for on my side of things.

OutdoorMan

12:47 am on Oct 4, 2006 (gmt 0)

Then I'll rename the index file and hope it solves the problem.

I believe that I've changed any link pointing at the /default.asp and suspects the reason to be my 'www vs. non-www 301 redirect' - wich until yesterday - by a mistake - pointed at /default.asp instead of pointing at /.

Anyway, I better check all my links again, just in case.

What will happen if I rename /default.asp, /index.asp - then the robots still gets the 404 and should lean (in time) that the page dosn't exists even if somebody links to it, right?

That said, I am also not currently noticing any problems on any of the Windows domains I work with. This doesn't mean, for example, that there isn't a "split PR issue"or something like that -- but I don't see signs of it in the SERPs or actual traffic.
I have had the sense recently that Google may be working to fix and avoid this type of "duplicate" url issue behind the scenes. I can only hope! Meanwhile I keep studying up on IIS, ASP, .NET and VBscript looking to find an answer that I can do on my side.

Interesting. I can understand the difficulties about the www vs. non-www issue (different owners aso.). But I cannot see why the bots can't understand that / and default.asp (or index.htm, index.html aso.) are the very same page. I guess it's algorithmic.

g1smd

1:59 am on Oct 4, 2006 (gmt 0)

Simply because there are at least 25 different things that an index page could be called, and multiple names might be in use on a site - only one of which is the actual one returned when / is asked for...

I have seen a site that had index.html and index.htm and default.asp and home.asp all active at the exact same time in the root.

OutdoorMan

2:30 am on Oct 4, 2006 (gmt 0)

I have seen a site that had index.html and index.htm and default.asp and home.asp all active at the exact same time in the root.

*LOL*

But then there should be an option in i.e. robots.txt, something like:

user-agent: Googlebot
index-page: none

For those who don't want index.html and so indexed ind folders or root.

or:

user-agent: Googlebot
index-page: index.html (or index.asp, default.asp aso.)

For those who specificly wants to name their choices.

Then it would be a lot easier for both Google and webmasters, I think.

OutdoorMan

2:41 am on Oct 4, 2006 (gmt 0)

... or maybe it even could be an option in Webmaster Tools - just like webmasters can select www or non-www as an option.

g1smd

6:33 pm on Oct 4, 2006 (gmt 0)

But the index page is always returned for www.domain.com/ whatever - so program your site to make that the only option that works with "200 OK".

Bluesplinter

7:57 pm on Oct 4, 2006 (gmt 0)

Use ISAPI_Rewrite to catch and 301 the default.asp to /

It bites having to use an addon for IIS that comes built in to Apache, but it works, and the support folks in their forums will even write the rules for you.

WiseWebDude

8:57 pm on Oct 4, 2006 (gmt 0)

Funny thing is that Google doesn't even try to redirect theirs...try

www.Google.com/
www.Google.com/index.html

they DO do the WWW deal though... [Google.com...]

OutdoorMan

3:51 pm on Oct 7, 2006 (gmt 0)

But the index page is always returned for www.domain.com/ whatever - so program your site to make that the only option that works with "200 OK".

How?

OutdoorMan

3:54 pm on Oct 7, 2006 (gmt 0)

Use ISAPI_Rewrite to catch and 301 the default.asp to /
It bites having to use an addon for IIS that comes built in to Apache, but it works, and the support folks in their forums will even write the rules for you.

I don't have administrative rights for the server (web hotel), so unfortunately that's not an option for me :o(

OutdoorMan

3:56 pm on Oct 7, 2006 (gmt 0)

Funny thing is that Google doesn't even try to redirect theirs...try
www.Google.com/
www.Google.com/index.html
they DO do the WWW deal though... [Google.com<...]
People who makes the 'laws' are often the first ones to break them ;o)

vite_rts

4:05 pm on Oct 7, 2006 (gmt 0)

have a look at this(see link below), then if you get a better answer, please post it here

[webmasterworld.com...]

BigDave

4:34 pm on Oct 7, 2006 (gmt 0)

People who makes the 'laws' are often the first ones to break them ;o)

It isn't a law, it just helps you to rank better. Just because you can reach their index.html doesn't mean that many people link to it.

The simple fact is that most of the index.html (or whatever) problems out there are caused by internal linking on the site. In your case the 301 caused it. If you don't link to it that way, other people will not copy and paste it from their address bar.

Now that you have fixed your 301, you probably don't even have to rename your file, but it might not be such a bad idea to do it anyway.

You could also rewrite your code to something like PHP or perl that runs on apache. That will get you to a much more controllable environment that is not subject to the biggest duplicate content scourge on the internet, case issues.

OutdoorMan

5:12 pm on Oct 7, 2006 (gmt 0)

Now that you have fixed your 301, you probably don't even have to rename your file, but it might not be such a bad idea to do it anyway.
You could also rewrite your code to something like PHP or perl that runs on apache. That will get you to a much more controllable environment that is not subject to the biggest duplicate content scourge on the internet, case issues.

The only one that (now) link to my /default.asp, is Google. Do you have any idea how long time it will take for Google to drop that link from it's index?

I have added this into my robots.txt:

user-agent: *
Disallow: /default.asp$

Wich should prevent robots from further indexing that particular URL.

I'm not much of a PHP or Pearl man. I can just barely manage ASP ;)

OutdoorMan

5:15 pm on Oct 7, 2006 (gmt 0)

And BTW. I have changed the filename from /default.asp to /index.asp and added both in my robots.txt as this:

User-agent: *
Disallow: /index.asp$
Disallow: /default.asp$

g1smd

5:28 pm on Oct 7, 2006 (gmt 0)

What about the index files in all of your folders?

OutdoorMan

5:48 pm on Oct 7, 2006 (gmt 0)

For the moment, I only have a problem with my index-file in the root level (the one default.asp wich Google has indexed).

But do you believe that I should do the same with my index files in all the folders and subfolders?

g1smd

6:18 pm on Oct 7, 2006 (gmt 0)

Of course, they are duplicate content too.

OutdoorMan

7:12 pm on Oct 7, 2006 (gmt 0)

Okay, now I have these lines in my robots.txt:

User-agent: *
Disallow: /index.asp$
Disallow: /default.asp$
Disallow: /*/default.asp$

Thanks for your help :)

asusplay

7:16 pm on Oct 7, 2006 (gmt 0)

I had the same problem, where G indexed my index.asp file when there were no links to it, so when i realised I couldn't do a 301 from index.asp to the root because it caused and endless loop, I changed the default page to default.asp and then redirected index.asp to the root (it seems that default.asp had a higher priority in being recognised as the homepage than index.asp)

As you I don't have access to IIS I think the problem will reappear, it's bound to. Google will recognise it as duplicate content and it will go supplemental again. It's a limitation of the windows server, and no one has been able to provide a workaround that I know of.

OutdoorMan

7:37 pm on Oct 7, 2006 (gmt 0)

As you I don't have access to IIS I think the problem will reappear, it's bound to. Google will recognise it as duplicate content and it will go supplemental again. It's a limitation of the windows server, and no one has been able to provide a workaround that I know of.

Asusplay > Why do you think the problem will reappear?

If Googlebot obey the robots.txt then Google shouldn't be able to index my default.asp or my index.asp (I hope not).

(it seems that default.asp had a higher priority in being recognised as the homepage than index.asp)

Could be right. Google last indexed my '/default.asp' on october 4th. My '/' was indexed on august 31th.

It seems to me that Google likes my '/default.asp' more than it likes my domain it self (even thoug I have inbound links wich points directly at the domain and no indbounds on '/default.asp').

Strange.

g1smd

8:00 pm on Oct 7, 2006 (gmt 0)

>> it seems that default.asp had a higher priority in being recognised as the homepage than index.asp <<

Recognised by the server? or by Google?

I assume you mean by the server: this is certainly true for Apache, where the DirectoryIndex can be set to:

DirectoryIndex: index.php index.html index.htm

What happens here is that the first file that exists in that list when parsed from left to right is the one that gets used when you ask for "/".

vite_rts

8:27 pm on Oct 7, 2006 (gmt 0)

Try the solution at the end of this thread

[webmasterworld.com...]

OutdoorMan

11:30 am on Oct 8, 2006 (gmt 0)

It looks like the problem is solved (I hope).

When I type site:example.com and hit enter, the only result that comes up is: http://www.example.com/

http://www.example.com/default.asp seems to have gone away :)

But now Google Webmaster Tools can't find my robots.txt - even thoug it exists on the same location where it allways have been.

g1smd

4:30 pm on Oct 8, 2006 (gmt 0)

Give it a week to catch up with itself.

OutdoorMan

1:22 am on Oct 9, 2006 (gmt 0)

I believe I've done all that I can do - for now. So I'll just wait and see what happens.

Thanks!