Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Index file in each site directory -- good or bad?

Usability of URLs versus duplicate content penalty

         

xenoborg

10:46 am on Jul 14, 2005 (gmt 0)

10+ Year Member



Hi everyone,

I'm thinking about URL strategy for my new site, and I have a dilemma.

I can put index.php (or index.html after mod_rewrite) into every directory of the site, so that users can type non-complete URLs ending with a trailing slash, like www.example.com/products/ -- which is essentially the same as www.example.com/products/index.php

My question is, can it trip the duplicate content filter? I mean, if I add index files into directories, there will be a situation when two URLs lead to the same content:

www.example.com/products/
www.example.com/products/index.php

Two URLs, same page, same content. Is it dangerous?

[edited by: ciml at 11:11 am (utc) on July 14, 2005]
[edit reason] Examplified [/edit]

Clint

1:19 pm on Jul 14, 2005 (gmt 0)



Depending on how far along you are, or how many folders you have, you may want to try what I did.

A long time back I had some hosts that had the option of protecting directory/folder view lists. Like if you went to .com/images/ all of your images are exposed, and they had the option of turning on the 403 forbidden via CHMOD. I changed hosts, and the new hosts didn't have this. So I just created a custom "forbidden" page called index.html that I placed in every folder. My current hosts have the option, but I just left these "forbidden" index.html pages in each folder.

You could do this (or deny directory/folder view list access via CHMOD if your hosts support it) unless you have pages with good PR and that are indexed in SE's with the URL's of yourdomain.com/FolderName/, then you don't want to change anything. If that is the case, then you should probably take the page with the higher PR or the page(s) that are indexed in SE's, and make that the directed TO page, and direct the other page versions to those pages via htaccess. For example, if www.yourdomain.com/FolderName/ has a higher PR or better SERP's than www.yourdomain.com/FolderName/index.html, you'd want the former directed to the latter. The inverse of course would also be true if the pages' ranks were opposite.

This would avoid any possible dupe content penalty, and as to your question if one would be imposed, that unfortunately is anyone's guess. However, doing one of these methods would keep you on the safe side where no one would have to guess as to any potential dupe content penalty.

xenoborg

3:25 pm on Jul 14, 2005 (gmt 0)

10+ Year Member



Clint, the site is not online yet so I can choose any of these two approaches. Which one would you recommend for a new site?

claus

10:33 pm on Jul 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Index file in each site directory -- good or bad?

As "good" or "bad" as seats in a car. You can do without them, but why on earth would you do something as silly as that?

>> Two URLs, same page, same content. Is it dangerous?

Think about it for a moment. How will anyone (including Google) ever find one or the other URL?

If you link to it, that's how. So, don't link to both versions. Make sure that you only and exclusively, always link to the folder name (with the trailing slash) and not the file name ("/index.xyz").


Btw.: Welcome to WebmasterWorld xenoborg :)

g1smd

10:42 pm on Jul 18, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All pages as index pages, placed inside folders and subfolders, but with all links ending with the folder name followed by a trailing / on the URL (so index.html can become index.php without having to change any links at all).

Sitemap listing all pages, or broken down into sections if site consists of very many pages. Google recommends less than 100 links per page, but can spider pages up to at least 100 KB (they did spider up to 250KB for a short time a few months ago).

For a small site, less than 20 pages, link all pages to all pages.

For a larger site make sure:
- the index page links to all of your section indexes.
- Each section index should link to every other section index, and to every content page of that section, and back to the main index page.
- Every content page should link back to the index page for that section, and back to the main index page, and (optionally) to every other section index too (this latter option spreads PR better).
- Some of the content pages might link to other content pages, from the same section or another section, but on a random basis ("related product" or "up sale" links, on a commerce site for example).
- The index page, should link directly to some deep content pages, such as "featured products", "story of the day", etc).

The URL structure should remain constant, the same deep content should be found at the same URL each time I come back. The folder structure should generate a URL that tells me where I am going.

Breadcrumb links can be useful. They can introduce keywords as anchor text.

Google follows the "click structure" of the site - the number of clicks away from the homepage the content is - not the actual folder structure.

xenoborg

2:55 pm on Jul 19, 2005 (gmt 0)

10+ Year Member



Ok, so I can have the directory indexes, and as long as I don't use the both two link styles simultaneously it won't be considered a duplication.

I'm just wondering: if I occasionally used the both versions of the link -- www.mysite.com/dir/ and www.mysite.com/dir/index.php -- how would Google react?

g1smd

3:57 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Separate indexing and separate PR for the two "pages".

You often see this effect on PHP driven forums...

travelin cat

4:03 pm on Jul 19, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Here's an interesting addition to your discussion, we had an index.html page and a default.html in the same directory of our site with identical content. Both had a pr of 5.

When we realized our error, and removed the default.html page, the ranking on the index.html page went up to a PR6 and traffic increased substantially....

claus

9:20 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great post (#5) g1smd :)

xenoborg, it is very important that whatever you do, you always do it consistently. So, do your very best to make sure that you don't occasionally do something else than you usually do.

jd01

10:41 pm on Jul 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would say put the index, and add a little .htaccess work to it.

RewriteEngine ON
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^.]*)index\.html\ HTTP/
RewriteRule index\.html$ /%1 [R=301,L]

This should only redirect an original request that ends in index.html to the directory it is in EG yoursite.com/index.html will be redirected if a user clicks on a link to it, or if they type it in the browser, but will not end in an infinite loop if they navigate to yoursite.com/ which serves them index.html

Justin