Forum Moderators: phranque
The problem is that I do not use file extensions on the end of my content pages. For example,
http://www.example.com/animals/baby-foxes
is a page in the "animals" directory that discusses baby foxes. It is not "baby-foxes/index.html" or "baby-foxes/index.php", or "baby-foxes/" etc, but just "baby-foxes".
The problem is that some of the search engines have taken to adding a backslash to the address of these pages, so that they think there are two files:
http://www.example.com/animals/baby-foxes (the correct address)
http://www.example.com/animals/baby-foxes/ (not the correct address)
Both addresses will show the content, but I want to force both the search engines and people visiting the website to use only the correct address.
Thank you.
The basic problem --I suspect-- is that you're using mod_negotiation [httpd.apache.org] and MultiViews to implement your extensionless filename support. While mod_negotiation/MultiViews is great for this kind of thing as well as for matching client language, character-set, and compression preferences, it's fundamental premise is to take a requested URL and find a 'best match' filename that corresponds (hopefully) to that URL.
So by definition, this leads to one file being potentially served for many, many URL variants -- Or as you put it, duplicate content.
The option addressed in the cited thread is to disable the MultiViews function, and to use mod_rewrite to support extensionless filenames instead. However, this may prove difficult or inefficient, depending on the number of extensionless filetypes used on your site. However, if only your *page* URLs are extensionless, the solution can be quite efficient, and even trivial.
Please take a look at that thread, and then post any questions or comments specific to your site back here.
Jim
Actually I use the Apache lookback feature.
For example,
http://www.example.com/widgets/look-at-this-widget
does not actually exist as a file. However, "widgets" does exist.
Apache 'falls back' to widgets. The file "widgets" parses out the url, finds that I am seeking "look-at-this-widget", which is a title in the database, and gets this content to serve to the web page.
Unfortunately, some of the search engines are apending a backslash on all the urls, apparently thinking they are directories.
so
http://www.example.com/widgets/look-at-this-widget
is also regaurded by them as
http://www.example.com/widgets/look-at-this-widget/
Which is not true. How would I force them not to do this?
or whatever extension I wanted and then hacked off the extension when doing the parsing. This could have prevented the problem, but at the time I did not think it would matter. Now there are hundred of pages indexed by the search engines and I would hate to have to start over.
RewriteEngine on
#
# Remove trailing slash from URL if requested directory does not exist
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/(.+)/$ http://www.example.com/$1 [R=301,L]
If mod_rewrite is invoked *before* the lookback function, then this should work, though it might require some tweaking.
Jim