Forum Moderators: Robert Charlton & goodroi
I simply changed the homepage from index.asp to default.asp and made sure there were no direct references to the file name, and that all links pointed to the domain name. The site is on asp and shared hosting so I had no access to the IIS.
However I have since realised that G has spidered default.asp page separately and it shows under the "omitted" results when doing a site: search, therefore incurring duplicate content penalty I'm sure. It is also showing again non www versions of some pages which is frustrating.
I thought that Google had somehow sorted this whole situation out. How can it spider a page that is not referenced directly ( ie www. example. com / default.asp , when all links point to www. example. com?
Is there any advice anyone can give regarding my situation? It's doing my head in once again...
Second, check that there are no old pages on your server, orphans, even, which could conceivably hold the faulty link.
Third, be sure you have a user friendly 404 page, to catch those deflected visitors.
Hard to be certain what has happened, but sounds like you have a ghost somewhere; a link to the page that Google is having trouble not believing. Provided the correct page also appears, then the old one will eventually drop out. Sometimes these are reinforced by long forgotten pages that happen to have an external link to them. Good housekeeping often helps them on their way. Xenu is your friend.
But much better to have no links to index or default pages (always use domain.com/ or folder/), and never a need for those 301s
I ran Xenu and it seems fine, plus I made sure theres no links to default.asp from inside the site.
It's as if Google has identified the path information (default.asp) and indexed it out of it's own accord. This does not happen with the other search engines so I have no idea why G has done it.
Would it help if I ran an exclusion in robots.txt to disallow www.example.com/default.asp, or would this disallow indexing of the homepage as www.example.com?
Am I completely off the mark here?
I thought that Google had somehow sorted this whole situation out. How can it spider a page that is not referenced directly ( ie www. example. com / default.asp , when all links point to www. example. com?
I noticed the same thing as well. We have never linked directly to our default.asp in the root directory and it has never shown in the serps. However once we started using google analytics all of the sudden it shows up.
I will have to say that there is a direct link between google spider and analytics after this showing up in the serps.
The following is the coding I've set up. It can be used on any asp page on any domain as it checks for the server, it can also be used in any subfolder (www.example.com/products/) and will send them to the "/" rather than default.asp in subfolders as well.
When setting up a new site I just use an include it on the top of the pages.
To redirect to the www version.
<%
Dim Domain_Name, theURL, QUERY_STRING, HTTP_PATH,TEMP_NUM
'Get domain that the page is on
Domain_Name = lcase(request.ServerVariables("HTTP_HOST"))
'Check if URL is the www version
if left(Domain_Name, 3) <> "www" Then
HTTP_PATH = request.ServerVariables("PATH_INFO")
'Check if page is default.asp if so, redirect to "/". If other index page is used, such
'as index.asp the numbers in the right and len statement need to be changed, as well
'as the IF statment to indicate the index page.
If right(HTTP_PATH, 12) = "/default.asp" Then
TEMP_NUM = len(HTTP_PATH)-11
HTTP_PATH = left(HTTP_PATH,TEMP_NUM)
End If
' Sets the new URL settings with correct page
QUERY_STRING = request.ServerVariables("QUERY_STRING")
theURL = "http://www." & Domain_Name & HTTP_PATH
'This section passes on the query string variables
if len(QUERY_STRING) > 0 Then
theURL = theURL & "?" & QUERY_STRING
end if
' Send 301 response and new location
Response.Clear
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", theURL
Response.Flush
Response.End
end if
%>
To direct to the non-www version I use.
<%
Dim Domain_Name, theURL, QUERY_STRING, HTTP_PATH,TEMP_NUM
' Get domain name the page is on
Domain_Name = lcase(request.ServerVariables("HTTP_HOST"))
' Check to see if www version
if left(Domain_Name, 3) = "www" Then
' Changes http path to non-www version
TEMP_NUM = len(Domain_Name)-4
Domain_Name = right(Domain_Name,TEMP_NUM)
HTTP_PATH = request.ServerVariables("PATH_INFO")
'Check if page is default.asp if so, redirect to "/". If other index page is used, such
'as index.asp the numbers in the right and len statement need to be changed, as well
'as the IF statment to indicate the index page.
If right(HTTP_PATH, 12) = "/default.asp" Then
TEMP_NUM = len(HTTP_PATH)-11
HTTP_PATH = left(HTTP_PATH,TEMP_NUM)
End If
' Sets the new URL settings with correct page
QUERY_STRING = request.ServerVariables("QUERY_STRING")
theURL = "http://" & Domain_Name & HTTP_PATH
'This section passes on the query string variables
if len(QUERY_STRING) > 0 Then
theURL = theURL & "?" & QUERY_STRING
end if
' Send 301 response and new location
Response.Clear
Response.Status = "301 Moved Permanently"
Response.AddHeader "Location", theURL
Response.Flush
Response.End
end if
%>
If any of those pages are shown as supplemental results then they will hang around for a year after the redirect is set up. Don't worry about that, they will not be harming things at all.
The code that was given above does not work for this purpose (I tired it) and I have similar code on my websites. I thought I had it covered when I specifically did not reference the default.asp page in any link whatsoever, but somehow it has spidered and indexed this page.
I feel that the site this has happened to will always suffer from duplicate content then because I don't see what else I can do. There's a massive limitation on what this type of hosting provides (and there's a hell of a lot of sites on shared hosting). The other search engines do not have this problem. I don't think it's too difficult to have in the algorithm something along the lines of:
If default.asp, index.asp, home.asp (or any other default page) = "/" then ignore and don't index.
Can anyone tell me if specifically disallowing www.example.com/default.asp in the robots.txt file will stop the homepage from being indexed as www.example.com?
You are right about not being able to forward default.asp to "/" through on page code. As far as I know. Even when "/" is all that shows in the browser, the variables below will all show default.asp, so there is nothing to check against. Thust the loop you spoke of.
PATH_INFO
SCRIPT_NAME
URL:
User-agent: *
Disallow: /default.asp
Disallow: /Default.asp
Will allow the home page, but block /default.asp and /Default.asp. However that does not pass any page rank on that must be there as in my case, and probably the most common, I have no internal links at all to default or Default so that means there are external links out there that I'd like to get credit for.