Forum Moderators: Robert Charlton & goodroi
For the sake of Google, last year, I changed my htaccess file so that http://example.com redirected to the www version.
Should I also be redirecting www.example.com/index.html to just www.example.com? I have changed my internal links to remove the "index.html" but was wondering if I should redirect it completely?
If so, how do i do this?
[edited by: tedster at 7:24 pm (utc) on Oct. 2, 2006]
[edit reason] use example.com [/edit]
Particularly since I added the redirect from index.htm to / and I don't want all the robots to have to spider through the redirect....
added the base href to my index page
I still think the use of the <base> tag in this case is completely supurflous and unnecessary.
But, in any case, it should refer to a directory, not a file. If you really did put in your index page, then for every URL, your server is prepending "/index.html" to it. Almost certainly not what you want.
That is, it will try to retrieve /example.jpg from /index.html/example.jpg.
You could specify "/" as your base, but that is the same as not using the base tag at all. Which is why I say it is not needed.
As it stands now, I have the redirect from www.example.com/index.htm to www.example.com/
My internal linking is "index.htm" "aboutus.htm", "privacy.htm"
so if i add the base, do i need to make the links "/", "/aboutus.htm" and "/privacy.htm"
I tried that, it didn't work.
If you don't include the <base> tag on the page, Google will certainly add one to their cached copy of the page. Check the cache for any page in their results.
Google ads the <base> tag to their cached copy because they are serving it from their server.
If they didn't add a <base> tag, none of the links or images on the page that use relative URLs would work.
The alternative would be that Google would have to parse the page and modify all of the URLs. Much easier to just add a <base> tag.
I still don't see the point of this. If you've done a 301 redirect from your secondary site(s), and your URLs are all relative, there's just no need for the <base> tag.
I guess it's a useful hack if for some reason you can't do a redirect, and don't want to modify all your URLs to make them absolute.
I have been trying to follow this as best I could but my coding skills are limited to making my name scroll across my screen saver..
So my question is....
Google has indexed a lot of my pages without the WWW - meaning http:/mysite.com
Google has indexed about 25% of my pages with WWW - meaning htt:/www.mysite.com
I realized (Thanks to you guys) that my navigation site links and images all were missing the WWW so I basically went in and corrected each and every link to include WWW
With no knowledge of Apache and htaccess (I have no idea of where even to find that in my control panel) have I resolved the issue?
Any help would be greatly appreciated!
Thanks in advance.
ARC
There is no secondary site. Just www.example.com.
I set up a redirect from example.com to www.example.com
example.com is the secondary (or alias) site. Alias is probably a better term here. When somebody enters example.com/anything into a browser, you want them to go to your primary site, example.com. Doesn't matter it's a completely seperate domain or a subdomain - the same principles apply. You want to direct users to your primary site, and you want search engines to give all the "credit" to your primary site.
Google has indexed a lot of my pages without the WWW - meaning http:/mysite.comGoogle has indexed about 25% of my pages with WWW - meaning htt:/www.mysite.com
I realized (Thanks to you guys) that my navigation site links and images all were missing the WWW so I basically went in and corrected each and every link to include WWW
With no knowledge of Apache and htaccess (I have no idea of where even to find that in my control panel) have I resolved the issue?
To a degree.
You still have a problem when others link to you. Some other sites might be linked to you using www, some without. When Google follows those links, they are going to land whereever the link said to go. (www or non-www.)
At least now when Google follows links around on your own site, they are going to go to your prefered version (www/non-www).
Using a 301 redirect, you insure that Google gets it right, even if others linking to you have gotten it wrong, because your site will immediately redirect them to the right version.
I am still trying to get my head around the htacess and I have researched it for my hosting company and they do not have a way for me to alter the server side...
Can I use CNNAME or ANAME to do the same?
Mind you, all I want to do is direct all non WWW to my WWW pages.
Thanks in advance!
ARC
I am still trying to get my head around the htacess and I have researched it for my hosting company and they do not have a way for me to alter the server side...Can I use CNNAME or ANAME to do the same?
No. That won't do the same thing.
First, let's establish which problem you are trying to solve. If it's www/non-www it's possible your host has already taken care of this for you. And, if not, there is still a solution.
If it is index.html -> /, you are out of luck. But this is a much less significant issue.
What happens when you type www.example.com vs. example.com into your browser? First of all, does your site come up in both cases? And, if so, what is in the URL bar once you are at your site? Does it stay the same as what you typed-in, or does it always change to one or the other?
Frankly, if your host DOESN'T take care of this for you, AND they don't give you the ability to do it yourself, you need another hosting company, because they are clueless...
If it stays the same as what you typed-in, you can just use an external redirect solution. Many registrars provide this service for free. If your registrar does not, there are free and low-cost DNS providers that will provide this for you. (You will have to change your DNS pointers at your registrar, and set-up your DNS at the DNS provider.)
While the registrars wave their hands and sometimes call this "DNS redirect", here is what is really happening: they set up an A record that points your secondary domain(s) to THEIR web server. Their web server does a 301 redirect to your primary web server.
Simple as that.
Thanks so very much for the detailed explanation and self test!
Guys - please excuse me with nonsense but I am truly trying...
Here is what's happening based on your questions...
When I type http://example.com it stays the same in my browser, but of course when I click on any of my navigation links which I have now changed to WWW I get those.
When I type http://www.example.com that stays the same in browser.
My host - Y a h o o does provide me with a control panel that allows me to add A and CNAME Records
Right now I seem to have the following 4 records:
Type: Source: Destination: Actions:
A Record mysite.com Yahoo! IP Address -- ¦ --
CNAME Record *.mysite.com Yahoo! Hostname -- ¦ --
CNAME Record ftp.mysite.com Yahoo! Hostname -- ¦ --
CNAME Record mail.mysite.com Yahoo! Hostname -- ¦ --
Should I add another record?
Thanks
ARC
[edited by: tedster at 7:02 pm (utc) on Oct. 12, 2006]
[edit reason] use example.com [/edit]
.
>> but i don't yet know how to 404 a php url <<
If it is a missing thread, then when there is nothing in the database for that thread you just do:
<?php
header("HTTP/1.0 404 Not Found");
?>
You'll just need some logic to test when to send that.
You still send out the page of HTML code with forum navigation on it for the person to read. The 404 code tells the bot that there is actually nothing there to be indexed.
This search found the code in seconds:
[google.com...]
RewriteCond %{THE_REQUEST} ^.*\/index\.html?
RewriteRule ^(.*)index\.html?$ http://example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^.*\/index\.php?
RewriteRule ^(.*)index\.php?$ http://example.com/$1 [R=301,L]
Should I combine the two, or will I be looking at potential problems if I do? I am not sure if (html¦php) would work as an "OR" or not, although it seems to make sense.
[edited by: Doucette at 4:55 pm (utc) on Oct. 15, 2006]
The test for html? matches for both html and htm, so your test for php? matches both php and ph, so remove the ? from the php part.
You could combine them in the way you suggested, but I would use (html?¦php) instead, if that is valid.
I am no Rewrite expert. Hope that jdMorgan will comment.
One modification I made was to handle both "index.html" and "index.php"
This brings up another issue. How much rewriting of index files is too much? That is, how many and which common index-files extensions should you rewrite? When do you cross the line?
I think there are only two good reasons to do this:
1. You previously had index.abc files, and you now have index.def files, or else you previously had index.abc files, and you now wish to just use references to the directories, leaving off the index file in your URLs. (The latter is preferable in any case, IMO.)
2. You want to SEO optimize the case where others have incorrectly used "index.whatever" when linking to you from their website.
Outside of extensions that you have actually used on your website in the past, I think the only one that should be re-written is .html, and possibily .htm. These are the only ones that others are going to automatically add to your URL when linking to you because they somehow think that this is the "correct" form.
If somebody types-in (or links-to from their site) say, index.php, and you've never used index.php on your site, it's almost certainly an error. They were probably trying to get to some OTHER site. And anything more obscure that .php that's almost certainly the case.
So, in those cases, just 404 it. If your site has a "friendly" 404, that's nice. But no need to simply transparently forward to your index page, and you might actually be doing the user a disservice by doing so. Better to make it clear that this page isn't here, and never was.
The whole index.xyz thing is an unfortunate remnant, and I'm not sure that there was EVER any need for it! I can't think of a historic web server that didn't support serving index.html when the directory is requested. The problem is, they didn't "hide" the index.html, which gave two ways access the same thing. It confused a lot of people, and it seems it's split about 50/50. (Similiar to www/non-www!)
IMO, that design was flawed, and unfortunately, we still live with it today.
Perhaps the problem came when server-side scripting started creeping out of the cgi-bin directory. Early web servers could serve from scripts, but initially, they had to be in the cgi-bin directory. Later, they allowed them outside of that directory, but then they had to be a certain extension, other than .html. I don't think we had the DirectoryIndex directive.
If my memory serves me right, where we went wrong was with .asp. Perhaps this was a limitation of IIS at the time, or else it was simply Microsoft pulling a subtle publicity stunt by encouraging people (by the way they wrote their examples and documentation) to name their ASP scripts ending in ".asp".
Which gets me back for one of the reasons not to use index.anything in your URLs: your users don't need to know if a page is produced by a script or not. And hackers, especially, don't need to know.
Redirect only the stuff that you actually use, plus .htm and .html, for people who just have to type index.html/htm into their browser or have it in their links, even though you never state it in your URLs yourself.
If the site has a mixture of .html and .php pages then a global redirect for both forms is certainly a good idea; and you might as well include .htm for those odd occasions that you upload a file as .htm by mistake, or someone cuts and pastes a link and misses off the last letter.
Yes, I agree that there could be a problem with redircting index.everything to / and I would certainly avoid doing that.
Perhaps somebody more familiar with Yahoo hosting can comment on whether or not Yahoo supports redirects. I've used them (for domain registration only), but moved those domains out, so I can't go in and fiddle to see if this can be done there.
I assume that you are using Yahoo as both your registrar (actually, Melbourne IT) and for your webhosting?
Let's just say I am not a fan of Melbourne IT. At least with their registrar and DNS services sold through Yahoo, they have some technical restrictions that are entirely unnecessary, and out of line with standard practices. (For example, you can have no more than 4 nameservers.)
If they do not support redirects - either in the DNS setup or in the webhosting package, then your only choice is to go with an external redirect service. This is most commonly offered as a no-charge extra along with third-party DNS hosting.
WebmasterWorld won't let me name names, but there is such a service that is free for up to 5 domains. There may be others that are free as well.
What you would do, then, is (on the new DNS service) set up an A record for www.example.com that points to your website. For example.com, you would set up a "redirect". Really, what this is doing is setting up an A record that points to the DNS provider's web server. They then set-up the web server to redirect to www.example.com.
Once this is set-up, you would go into your domain setup, and change your DNS servers to point to the external DNS service.
To add to the discussion, the reason I wish to do these redirects is to have my URLs appear as non-ugly as possible. It is much easier to understand (and communicate) a URL that is example.com/product than example.com/product/index.html. Perhaps my motivation behind my redirects is not the same as everyone else's.
I actually use CityDesk as a CMS, which always links to "/index.html", and I cannot set it to link to "/" instead. It is understanding, as the CMS should not be dependant on server settings it cannot change. But this little annoyance creates causes "index.html" to appear all over my site as you surf through it. This redirect is also a hack to fix this little "bug", if you can even call it that.