Forum Moderators: phranque

Message Too Old, No Replies

Safe Use of DirectoryIndex Directive

In a Virtual Account htaccess

         

Angonasec

8:07 am on Jul 12, 2005 (gmt 0)



I'm on a good virtual host using latest Apache, and I'd appreciate your advice about how I've been using the DirectoryIndex directive in my root .htaccess file.

My concern is with how this affects our listing in SEs.

There's no index.html or index.php type file in the main www directory. Instead I named the default homepage 'keyword.htm'

In my root .htaccess file I have the lines:

Options -Indexes
ErrorDocument 404 /404.html
DirectoryIndex keyword.htm

By default my host's server deals with a request for www.example.com/ (ie. directory specified but not a file)

by searching for a file named 'index.html' then 'index.htm' etc unless overidden by an htaccess file. (As in mycase.)

My understanding was that requests for www.example.com/ would be redirected to www.example.com/keyword.htm
And also that any SE bots looking for www.example.com/index.html would also be redirected to www.example.com/keyword.htm

Checking with my host, I discover I was wrong, and that a bot looking for www.example.com/index.html gets the 404.htm page

The site has been setup like this for years, so why am I bothered now?

Because G and other SE's have our homepage listed as both www.example.com/ and www.example.com/keyword.html

Which I assume inadvertantly invokes a measure of duplicate penalty. (Though 'both' pages do well in the serps.)

So before changing anything I'd like to know how to safely change our site so that the www.example.com/ listing is dropped/removed from the SEs, and only the www.example.com/keyword.html is listed.

I also have similar .htaccess pages in my two subdomains containing:

ErrorDocument 404 /404.html
DirectoryIndex differentkeyword.htm

I assume they need to be changed too.

I've studied the very brief Apache doc on DirectoryIndex without getting an answer.

Sound safe advice please?

jdMorgan

3:19 pm on Jul 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> My understanding was that requests for www.example.com/ would be redirected

The DirectoryIndex directive does not invoke an external redirect, since that would involve telling the client browser to re-request the page at the "keyword.htm' URL (it would be terribly inefficient requiring two client fetches for each page so redirected). DirectoryIndex simply tells the server where to find the "index" file for a directory. The index file is served when no page is specified in the client's request - i.e. "GET /". If no DirectoryIndex is specified, then the server will actually produce an index - a list of the files in that directory, as long as Options Indexes is not disabled.

Somehow, your 'keyword.htm' page got 'exposed' and that's a problem. You can fix it with mod_rewrite if that's what you want to do. If you want to cloak your index page so that search engines see 'keyword.htm' while visitors see '/' then that's going to be difficult if not impossible to do without an external redirect for each visitor who comes to your site after a search.

Jim

Angonasec

6:04 pm on Jul 12, 2005 (gmt 0)



Thanks Jim, I really appreciate your attention.

No I certainly don't want to do any form of cloaking.
Nothing dodgy at all. Horrors!

I do want my keyword.htm homepage exposed, that was the idea. What I don't want is the same page listed in the serps as example.com/ and example.com/keyword.htm

(And I don't want my directory list of files exposed)

Am I misusing the commands as follows to achieve this?

Options -Indexes
ErrorDocument 404 /404.html
DirectoryIndex keyword.htm

If so how do I get rid of the example.com/ listings in the SEs safely?

So that only the example.com/keyword.htm listings remain.
(ie. no inadvertant duplicates.)

Your advice appreciated greatly.

zomega42

10:45 pm on Jul 12, 2005 (gmt 0)

10+ Year Member



As far as I know, there is no way to have example.com/ removed from the Google results. You could probably remove example.com/keyword.html, however, just by changing all of your links to point to the domain name without the keyword. If you also have external links pointing to the keyword page, you could move your index to index.html and put a 301 redirect on keyword.html to index.html.

jdMorgan

11:22 pm on Jul 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can 301 redirect "/" to /keyword.htm with the following, assuming that you already have other mod_rewrite code in your .htaccess file:

RewriteRule ^$ http://www.example.com/keyword.htm [R=301,L]

Google *should* then list your page as "/keyword.htm" after the next spider/update cycle. However, it sounds like maybe zomega42 has tried this, so I can't say if it will achieve your ultimate goal.

Jim

Angonasec

2:22 am on Jul 13, 2005 (gmt 0)



Thanks Zomega42:
Perhaps a word of warning to others might be appropriate here, because I've read on WebmasterWorld of people wiping their sites from G accidentally by using the 'G removal tools'.

Don't use the G removal tool to get rid of example.com/ entries! Only specific file names.

Seems obvious, I know.

Jim:

You can 301 redirect "/" to /keyword.htm with the following, assuming that you already have other mod_rewrite code in your .htaccess file:

RewriteRule ^$ http://www.example.com/keyword.htm [R=301,L]

Google *should* then list your page as "/keyword.htm" after the next spider/update cycle.

That sounds like it is worth trying, (yes I do have mod_rewrite switched on and in use) but first I want to be sure I don't make things worse.

For instance, if I use that RewriteRule (which I understand to mean that all calls to my main domain for un-named files are sent to keyword.htm) should I remove either of the other directives?:

ErrorDocument 404 /404.html
DirectoryIndex keyword.htm

I'm wondering what would happen to a bot looking for "GET /"

Also, I have .htaccess files in my two subdomains, so would this be the correct type of rule to use in them?

RewriteRule ^$ [subdomain.example.com...] [R=301,L]

jdMorgan

5:40 pm on Jul 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A 'bot looking for "/" would be redirected to '/keyword.htm', just like a browser. If you code it so the behaviour is different, then that's cloaking. Cloaking is not necessarily dodgy; I use it to keep 'bots out of my e-mail contact forms, for example. But because we're discussing 'keyword-in-URL' techniques here, cloaking in this case could be seen as dodgy.

> Also, I have .htaccess files in my two subdomains, so would this be the correct type of rule to use in them?

Looks OK to me, but you should test and make sure the results are what you expect. The server headers checker [webmasterworld.com] in the WebmasterWorld Control Panel may come in handy for this.

Jim

Angonasec

6:29 pm on Jul 13, 2005 (gmt 0)



Thank you Jim, that seems to work fine.
I checked with the WebmasterWorld header tool.

You didn't reply to my query about leaving the following lines in too, so I've left them in, hoping they don't cause the Bots to see double:

Options -Indexes
ErrorDocument 404 /404.html
DirectoryIndex keyword.htm

I have several folder in my www directory, none of which has any file called index.htm index.php etc
as in my root directory I use a keyword for the main folder file.

What would be the appropriate mod_rewrite to do the same in my folders as in the root directory?

ie.

www.example.com/folder1/ calls 301'd to www.example.com/folder1/folderkeyword.htm

www.example.com/folder2/ calls 301'd to www.example.com/folder2/folder2keyword.htm

I don't have any .htaccess file in any folders, only in my root and subdomains.

We all appreciate your help Jim.

Colin