Forum Moderators: Robert Charlton & goodroi
I'm thinking about URL strategy for my new site, and I have a dilemma.
I can put index.php (or index.html after mod_rewrite) into every directory of the site, so that users can type non-complete URLs ending with a trailing slash, like www.example.com/products/ -- which is essentially the same as www.example.com/products/index.php
My question is, can it trip the duplicate content filter? I mean, if I add index files into directories, there will be a situation when two URLs lead to the same content:
www.example.com/products/
www.example.com/products/index.php
Two URLs, same page, same content. Is it dangerous?
[edited by: ciml at 11:11 am (utc) on July 14, 2005]
[edit reason] Examplified [/edit]
A long time back I had some hosts that had the option of protecting directory/folder view lists. Like if you went to .com/images/ all of your images are exposed, and they had the option of turning on the 403 forbidden via CHMOD. I changed hosts, and the new hosts didn't have this. So I just created a custom "forbidden" page called index.html that I placed in every folder. My current hosts have the option, but I just left these "forbidden" index.html pages in each folder.
You could do this (or deny directory/folder view list access via CHMOD if your hosts support it) unless you have pages with good PR and that are indexed in SE's with the URL's of yourdomain.com/FolderName/, then you don't want to change anything. If that is the case, then you should probably take the page with the higher PR or the page(s) that are indexed in SE's, and make that the directed TO page, and direct the other page versions to those pages via htaccess. For example, if www.yourdomain.com/FolderName/ has a higher PR or better SERP's than www.yourdomain.com/FolderName/index.html, you'd want the former directed to the latter. The inverse of course would also be true if the pages' ranks were opposite.
This would avoid any possible dupe content penalty, and as to your question if one would be imposed, that unfortunately is anyone's guess. However, doing one of these methods would keep you on the safe side where no one would have to guess as to any potential dupe content penalty.
As "good" or "bad" as seats in a car. You can do without them, but why on earth would you do something as silly as that?
>> Two URLs, same page, same content. Is it dangerous?
Think about it for a moment. How will anyone (including Google) ever find one or the other URL?
If you link to it, that's how. So, don't link to both versions. Make sure that you only and exclusively, always link to the folder name (with the trailing slash) and not the file name ("/index.xyz").
Sitemap listing all pages, or broken down into sections if site consists of very many pages. Google recommends less than 100 links per page, but can spider pages up to at least 100 KB (they did spider up to 250KB for a short time a few months ago).
For a small site, less than 20 pages, link all pages to all pages.
For a larger site make sure:
- the index page links to all of your section indexes.
- Each section index should link to every other section index, and to every content page of that section, and back to the main index page.
- Every content page should link back to the index page for that section, and back to the main index page, and (optionally) to every other section index too (this latter option spreads PR better).
- Some of the content pages might link to other content pages, from the same section or another section, but on a random basis ("related product" or "up sale" links, on a commerce site for example).
- The index page, should link directly to some deep content pages, such as "featured products", "story of the day", etc).
The URL structure should remain constant, the same deep content should be found at the same URL each time I come back. The folder structure should generate a URL that tells me where I am going.
Breadcrumb links can be useful. They can introduce keywords as anchor text.
Google follows the "click structure" of the site - the number of clicks away from the homepage the content is - not the actual folder structure.
I'm just wondering: if I occasionally used the both versions of the link -- www.mysite.com/dir/ and www.mysite.com/dir/index.php -- how would Google react?
When we realized our error, and removed the default.html page, the ranking on the index.html page went up to a PR6 and traffic increased substantially....
RewriteEngine ON
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]*)index\.html\ HTTP/
RewriteRule index\.html$ /%1 [R=301,L]
This should only redirect an original request that ends in index.html to the directory it is in EG yoursite.com/index.html will be redirected if a user clicks on a link to it, or if they type it in the browser, but will not end in an infinite loop if they navigate to yoursite.com/ which serves them index.html
Justin