Forum Moderators: Robert Charlton & goodroi
Today I've sadly discovered that Google has indexed my front page both as:
http://www.example.com/ and http://www.example.com/default.asp
I have the www vs. non-www issue covered by an ASP-code I've found here: [webmasterworld.com...]
But how in the ... do I also prevent the SE's from indexing my pages as both www.example.com/ and www.example.com/default.asp?
I really hope, someone has a usefull ASP-solution to prevent this issue.
[edited by: tedster at 10:55 pm (utc) on Oct. 3, 2006]
[edit reason] use example.com [/edit]
The issue here that I have not been able to resolve on a Windows server so far, is that any attempted redirect method I try on Windows seems to create an infinite loop. There is an approach for index.html redirect on Apache [webmasterworld.com] that avoids the redirect loop, and I am assuming that something can also be done on Windows, but I'm not clever enough, so far, to come up with it.
That said, I am also not currently noticing any problems on any of the Windows domains I work with. This doesn't mean, for example, that there isn't a low-level "split PR issue" or something like that -- but I don't see signs of it in the SERPs or in actual search traffic.
I have had the sense recently that Google may be working to fix and avoid this particular type of "duplicate" url issue behind the scenes. It is extremely common, but I can only hope! Meanwhile I keep studying up on IIS, ASP, .NET and VBscript -- looking to find an answer that I can be responsible for on my side of things.
I believe that I've changed any link pointing at the /default.asp and suspects the reason to be my 'www vs. non-www 301 redirect' - wich until yesterday - by a mistake - pointed at /default.asp instead of pointing at /.
Anyway, I better check all my links again, just in case.
What will happen if I rename /default.asp, /index.asp - then the robots still gets the 404 and should lean (in time) that the page dosn't exists even if somebody links to it, right?
That said, I am also not currently noticing any problems on any of the Windows domains I work with. This doesn't mean, for example, that there isn't a "split PR issue"or something like that -- but I don't see signs of it in the SERPs or actual traffic.I have had the sense recently that Google may be working to fix and avoid this type of "duplicate" url issue behind the scenes. I can only hope! Meanwhile I keep studying up on IIS, ASP, .NET and VBscript looking to find an answer that I can do on my side.
Interesting. I can understand the difficulties about the www vs. non-www issue (different owners aso.). But I cannot see why the bots can't understand that / and default.asp (or index.htm, index.html aso.) are the very same page. I guess it's algorithmic.
I have seen a site that had index.html and index.htm and default.asp and home.asp all active at the exact same time in the root.
I have seen a site that had index.html and index.htm and default.asp and home.asp all active at the exact same time in the root.
But then there should be an option in i.e. robots.txt, something like:
user-agent: Googlebot
index-page: none
For those who don't want index.html and so indexed ind folders or root.
or:
user-agent: Googlebot
index-page: index.html (or index.asp, default.asp aso.)
For those who specificly wants to name their choices.
Then it would be a lot easier for both Google and webmasters, I think.
www.Google.com/
www.Google.com/index.html
they DO do the WWW deal though... [Google.com...]
Use ISAPI_Rewrite to catch and 301 the default.asp to /
It bites having to use an addon for IIS that comes built in to Apache, but it works, and the support folks in their forums will even write the rules for you.
I don't have administrative rights for the server (web hotel), so unfortunately that's not an option for me :o(
Funny thing is that Google doesn't even try to redirect theirs...try
www.Google.com/
www.Google.com/index.htmlthey DO do the WWW deal though... [Google.com<...]
People who makes the 'laws' are often the first ones to break them ;o)
[webmasterworld.com...]
People who makes the 'laws' are often the first ones to break them ;o)
It isn't a law, it just helps you to rank better. Just because you can reach their index.html doesn't mean that many people link to it.
The simple fact is that most of the index.html (or whatever) problems out there are caused by internal linking on the site. In your case the 301 caused it. If you don't link to it that way, other people will not copy and paste it from their address bar.
Now that you have fixed your 301, you probably don't even have to rename your file, but it might not be such a bad idea to do it anyway.
You could also rewrite your code to something like PHP or perl that runs on apache. That will get you to a much more controllable environment that is not subject to the biggest duplicate content scourge on the internet, case issues.
Now that you have fixed your 301, you probably don't even have to rename your file, but it might not be such a bad idea to do it anyway.You could also rewrite your code to something like PHP or perl that runs on apache. That will get you to a much more controllable environment that is not subject to the biggest duplicate content scourge on the internet, case issues.
The only one that (now) link to my /default.asp, is Google. Do you have any idea how long time it will take for Google to drop that link from it's index?
I have added this into my robots.txt:
user-agent: *
Disallow: /default.asp$
Wich should prevent robots from further indexing that particular URL.
I'm not much of a PHP or Pearl man. I can just barely manage ASP ;)
As you I don't have access to IIS I think the problem will reappear, it's bound to. Google will recognise it as duplicate content and it will go supplemental again. It's a limitation of the windows server, and no one has been able to provide a workaround that I know of.
As you I don't have access to IIS I think the problem will reappear, it's bound to. Google will recognise it as duplicate content and it will go supplemental again. It's a limitation of the windows server, and no one has been able to provide a workaround that I know of.
Asusplay > Why do you think the problem will reappear?
If Googlebot obey the robots.txt then Google shouldn't be able to index my default.asp or my index.asp (I hope not).
(it seems that default.asp had a higher priority in being recognised as the homepage than index.asp)
Could be right. Google last indexed my '/default.asp' on october 4th. My '/' was indexed on august 31th.
It seems to me that Google likes my '/default.asp' more than it likes my domain it self (even thoug I have inbound links wich points directly at the domain and no indbounds on '/default.asp').
Strange.
Recognised by the server? or by Google?
I assume you mean by the server: this is certainly true for Apache, where the DirectoryIndex can be set to:
DirectoryIndex: index.php index.html index.htm
What happens here is that the first file that exists in that list when parsed from left to right is the one that gets used when you ask for "/".
[webmasterworld.com...]
When I type site:example.com and hit enter, the only result that comes up is: http://www.example.com/
http://www.example.com/default.asp seems to have gone away :)
But now Google Webmaster Tools can't find my robots.txt - even thoug it exists on the same location where it allways have been.