Forum Moderators: open

Message Too Old, No Replies

Yahoo is leaving the trailing '/' off of my URLs

mod_rewrite folders don't automatically redirect to '/'

         

roldar

9:35 pm on Jul 31, 2005 (gmt 0)

10+ Year Member



Is this a common occurance for anybody else? Pages indexed by Yahoo are linking to my site without the trailing '/' at the end of folders.

Usually this wouldn't be a problem, but since my directories are all fake mod_rewrite ones, it throws up a 404 if the trailing '/' isn't there.

I was able to fix this by adding duplicate rules that rewrite unslashed version to slashed versions, but it seems ridiculous that I even had to do that. There are *no* links anywhere on my site to the unslashed version, and the fact that Yahoo was able to find and index the pages makes it apparent that they're displaying incorrect URLs.

This only breaks pages that are virtual folders, while all other pages ending with .htm work fine.

Could the length of the URL be a factor? Some of mine are quite long, and maybe it wants to save space by leaving it off?

Is this a common occurance? Am I just a special case?

encyclo

1:33 am on Aug 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yahoo has a habit of doing this: it also lists root pages as http://www.example.com - without the trailing slash.

When using mod_rewrite to create static-looking URLs, I much prefer creating URLs with a file extension (usually .htm or .html) to avoid any chance of confusion. You can also create URLs with no extension, but no trailing slash either.

jdMorgan

3:54 pm on Aug 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I was able to fix this by adding duplicate rules that rewrite unslashed version to slashed versions

You should not need duplicate rules. Just use the rule for the slashed version, and put a "?" after the slash in the RewriteRule pattern.

For example, the two rules


RewriteRule ^subdir/([^/]+)/$ /page.php?page=$1 [L]
RewriteRule ^subdir/(.+)$ /page.php?page=$1 [L]

Can be handled with one

RewriteRule ^subdir/([^/]+[b])/?$[/b] /page.php?page=$1 [L]

The "?" makes the preceding slash optional.

Jim

Prolific

4:18 pm on Aug 1, 2005 (gmt 0)

10+ Year Member



one issue that may crop up with making the trailing slash optional is people may link to the URL they follow through yahoo. Google might index your page and duplicate content may follow if you don't redirect. Redirecting is the proper thing to do. I wouldn't make the slash optional.

abates

4:05 am on Aug 2, 2005 (gmt 0)

10+ Year Member



I wish Yahoo would stop doing this. It causes a completely unnecessary redirect every time Slurp or someone coming from Yahoo's search hits one of those de-slashed URLs. They shouldn't be removing bits of my URLs.

jdMorgan

4:16 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Prolific,

The "?" makes the trailing slash optional in the regular-expressions pattern to be matched, so that one rule matches the URL with or without the trailing slash, and the rewrite will occur. This eliminates the need for duplicate rules, as stated.

Jim

roldar

5:42 am on Aug 2, 2005 (gmt 0)

10+ Year Member



The "?" makes the trailing slash optional in the regular-expressions pattern to be matched, so that one rule matches the URL with or without the trailing slash, and the rewrite will occur. This eliminates the need for duplicate rules, as stated.

But wouldn't that mean that if somebody linked to the unslashed version and another person the slashed version, that the SE's might think these are two separate pages with duplicate content?

Right now I do a 301 from unslashed to slashed. The slashed version is a rewrite to the .php file. This way if a spider follows the unslashed it gets redirected to the slashed, so it knows the original was wrong.

Here's what my .htaccess looks like:

RewriteRule ^(.*)/$ h*tp://www.domain.com/index.php?folder=$1 [L]
RewriteRule ^(.*)/(.*\.html)$ h*tp://www.domain.com/index.php?folder=$1&page=$2 [L]

RewriteRule ^(.*)$ h*tp://www.domain.com/$1/ [R=301,L]

--------------

So if the URL is correct the first time it works perfectly. If it's missing the slash and it doesn't end in .html, it get send through the last rule which adds the slash and sends a 301.

Are there any problems with the method I am using?

jd01

6:08 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, there does not appear to be anything wrong with the way you are doing it, but there is some efficiency you can add to your rules:

RewriteRule ^(.*)/$ h*tp://www.domain.com/index.php?folder=$1 [L]
RewriteRule ^(.*)/(.*\.html)$ h*tp://www.domain.com/index.php?folder=$1&page=$2 [L]

RewriteRule ^(.*)$ h*tp://www.domain.com/$1/ [R=301,L]

The use of negative forward-looking patterns are much more efficient than the .* catch-all, so the file would use considerably less processor resouces by switching to:

Any one or more characters that is not a slash, followed by a slash at the end of a line:
RewriteRule ([^/]+)/$ http://www.domain.com/index.php?folder=$1 [L]

Any one or more characters that is not a slash, followed by a slash, followed by one or more characters that is not a .(dot) followed by html at the end of a line:
RewriteRule ([^/]+)/([^.]+\.html)$ http://www.domain.com/index.php?folder=$1&page=$2 [L]

Any request that does not contain a dot, and does not end in a slash:
RewriteRule ([^.]+[^/])$ http://www.domain.com/$1/ [R=301,L]

Justin