Forum Moderators: phranque

Message Too Old, No Replies

Why do incorrect URL's resolve properly?

Finding truncated URL's in log files

         

Liane

10:40 pm on Jul 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It was suggested by volatilegx in another thread that I should post here. I have no clue about server side stuff ... but here goes.

All of a sudden, as of about 6 months ago, referrals are showing up in my log files with incorrect and incomplete URL's and I want to know if there is an "easy fix" for someone with very little knowledge of how stuff works on the server side of things.

My pages are all absolute URL's such as:

www.mysite.com/page.html

The problem is that scrapers and other less than desirable sites are listing my pages without the ".html" extension and those pages (to my annoyance) resolve properly!

Same thing for the home page which ends with .com ... it also resolves if you use .com/index.html

I'd like to know if I can stop the page from resolving properly unless the proper URL is used?

I'd also like to know why (after having my site on the internet for nearly 8 years) this problem just suddenly started to appear?

I'd appreciate any and all advice.

[edited by: Liane at 10:42 pm (utc) on July 19, 2006]

abates

11:49 pm on Jul 19, 2006 (gmt 0)

10+ Year Member



According to the Apache content negotiation documentation:
[httpd.apache.org...]

It says if you're using language negotiation, then files can be accessed without the extension.

I tried duplicating the effect on my server (apache 1.3) by adding "AddLanguage" directives to .htaccess but couldn't get it to work.

incrediBILL

11:54 pm on Jul 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm assuming your site is something written in PHP?

Are all of the pages resolving to the correct page or just to your index page?

Liane

12:16 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm assuming your site is something written in PHP?
Are all of the pages resolving to the correct page or just to your index page?

No. All pages are straight html and yes, they all resolve to the correct page.

[edited by: Liane at 12:16 am (utc) on July 20, 2006]

jdMorgan

2:53 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Put the following directive in the .htaccess file in your top-level web-accessible directory:

Options -MultiViews

That should stop the extensionless files from resolving. Please post again on the status of that fix, and on any remaining problems.

Jim

coopster

4:28 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I was thinking the same thing right off, jd. But, even if Liane is on a shared server do you think the hosting provider would turn on MultiViews for the entire server config? That's crazy ...

jdMorgan

4:32 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Most shared hosting I've been on, the default is MultiViews enabled ... Crazy, but true.

Jim

coopster

4:36 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Wow. Most shared hosts that I have been on don't even compile mod_negotiation into Apache. But, this is a whole different topic ;)

Liane

10:21 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



WestHost?

coopster

5:08 pm on Jul 20, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Turning off MultiViews as jdMorgan suggested didn't resolve the issue?

Liane

2:09 am on Jul 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes! That worked. Thank you very, very much! Now how do I fix the issue of /index.html still resolving to my index when the index is simply .com with no /index.html anywhere on the site?

jdMorgan

2:36 am on Jul 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Options +FollowSymLinks
RewriteEngine on
#
# Redirect direct client requests for "/index.html" to "/"
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html[^\ ]*\ HTTP/
RewriteRule ^index\.html$ http://www.example.com/ [R=301,L]

Jim

Liane

10:18 am on Jul 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim,

Wow ... you are a font of knowledge on this stuff! Does that go in the same .htaccess file?

coopster

12:32 pm on Jul 21, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yes, it can. You can just combine the two Options in your first line though:
Options +FollowSymLinks -MultiViews

Liane

9:30 am on Jul 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perfect! For the first time in about 6 months, my log files look terrific! Thanks guys! :)