Welcome to WebmasterWorld Guest from 23.22.17.192

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Page Indexed with double //

How to prevent

   
4:30 pm on Aug 3, 2007 (gmt 0)

10+ Year Member



My sites ULR's are eg www.mysite.com/fluff/

I recently notice that Google has indexed www.mysite.com/fluff//

It also seems that despite the amount of inbounds pointing to www.mysite.com/fluff/ Google has chosen to rank www.mysite.com/fluff//

The page with two trailing slashes has been indexed and has PR. If I navigate to this link most of the images are broken. Looks really bad.

How do I stop Google indexing multiple forward slashes?

Thanks in advance

[edited by: Pass_the_Dutchie at 4:31 pm (utc) on Aug. 3, 2007]

4:48 pm on Aug 3, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Apache treats multiple slashes as single slashes.

You can 301 redirect double-slashed URLs to the correct URL using mod_rewrite in .htaccess:


RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

or using RedirectMatch:

RedirectMatch 301 ^(.*)//+(.*)$ http://www.example.com/$1/$2

Either of these snippets will replace two or more contiguous slashes with a single slash.

Jim

6:49 pm on Aug 3, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



"How do I stop Google indexing multiple forward slashes?"

by locating the broken link that is causing it!
Try searching your html for "<a href=""><a/>" or blank links.

Good luck on html searches for "//"!

10:20 am on Aug 5, 2007 (gmt 0)

10+ Year Member



Thanks for the feedback.

All internal links pointing to this page are correct. The page has PR5 and has lots of quality links pointing to it. However, one of these links from a directory site, which has grey PR is linking to the site with // at the end of the URL. Crazy! Why would Google prefer to index the page with the double slashes over all the internal links and several external links?

jdMorgen - I tried your suggestions but there seems to be a conflict with the current code.

Here is the entire code on the .htaccess file:
-----------------------------
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.htm\ HTTP/
RewriteRule ^(([^/]*/)*)index\.htm$ [c...] [R=301,L]

RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

-----------------------------------------------

This now results in : http://www.example.com//fluff/

It puts double slashed in the middle of the URL.

The same thing happens if I use the RedirectMatch code.

Please could you let me know what I am doing wrong.

Many thanks in advance.

Dutchie

7:52 am on Aug 7, 2007 (gmt 0)

10+ Year Member



Hi,

Don't like to bump but I really need some help on this one. Pages from our sites are droping like flies and it may be due to a competitor linking to internal pages with // at the end of the URL which is causing duplication issues in Google.

Please could someone look at the .htaccess source above and show me (and no doubt others) how to stop the indexing of multiple forward slashes in a URL.

Thanks for your reply.

D

1:31 pm on Aug 7, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well try this. It's a bit of a hack, but it might work better... Also, your rules should be in order, from most-specific to least-specific:

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim
3:53 pm on Aug 7, 2007 (gmt 0)

10+ Year Member



Thanks Jim. This has solution has solved the problem of the double slash.
4:27 pm on Aug 8, 2007 (gmt 0)

10+ Year Member



Hi me again,

I just noticed that this source works until you have:

www.example.com/fluffy/dice//

The above .htaccess does not deal with second tier double slashes. I tried ammeding your code but I could not get it to work.

Please advise.

Many thanks once again.

D

5:17 pm on Aug 8, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member




Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [b][OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$[/b]
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Please take care to avoid double-posting your replies. I've had to remove one triple-post and two double-posts so far.

Thanks,
Jim

7:06 pm on Aug 8, 2007 (gmt 0)

10+ Year Member



Yea, sorry about the double posting, browser kept hanging.

with regard to
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
is there any difference between

^example\.com [NC] and ^example.com [NC]

7:26 pm on Aug 8, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



An unescaped "." in a pattern is a regular-expressions token matching any single character.

RewriteCond %{HTTP_HOST} ^example\.com [NC]
Hostname must start with "example", followed by a literal period, followed by "com", match is case-insensitive.

RewriteCond %{HTTP_HOST} ^example.com [NC]
Hostname must start with "example", followed by any single character, followed by "com", match is case-insensitive.

In this code, it makes no practical difference. However, the period should be escaped as a 'best practice' so that the code is clear and correct, and to reinforce the habit so that you will escape periods when it is critically important to do so.

Jim

7:24 am on Aug 9, 2007 (gmt 0)

10+ Year Member



Reason I ask is because many other examples I have found on the net do not escape the period. I think I will choose the method of best practice :)

Back to the code......

I have changed the .htaccess file as suggested but I still cant seem to redirect the // on a second tier link e.g www.example.com/remove/slashes//

I have copied it as it is then I tried using either

RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$

No joy.

Am I missing somthing?

4:27 pm on Aug 9, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Jeez, this should not be so hard...

How about this:


RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^/(([^/]+/)*)/+$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]

Be sure that you're completely flushing your browser cache after makinf any change to the .htaccess file(s) on your server.

Jim

8:11 pm on Aug 9, 2007 (gmt 0)

10+ Year Member



From the previous post, it seems we're in .htaccess, so would I be right in thinking that:

rewriterule ^/ whatever

will only trip if the double slash is before the first directory?

Peter

2:57 am on Aug 10, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Well-spotted, Peter!

Focusing on the patterns as they changed and progressed, I missed that point. Thanks for the help!

So, with the start-anchor removed from the RewriteRule pattern, the simpler RewriteCond pattern from several posts previous should work:


RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]

Jim

3:46 pm on Aug 10, 2007 (gmt 0)

10+ Year Member



I just updated the .htaccess file and the following code re-directs non-www to www, removes /index.htm, and removes multiple / slashes from both within and at the end of the URL :)

----------------------

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

----------------------

IMO this is an important .htaccess addition particularity for ULRs that end in /

Jim, many thanks for your expertise and patience and Pete thanks for your heads up.

3:41 pm on Aug 24, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Are there any input URLs that could cause a double or treble redirect before the browser reaches the final target URL?

I assume a non-www named index file in a folder with double slash might do that?

9:39 pm on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yes, it's possible. But combinations of errors in links should be relatively rare, and the code to handle all possibilities with a single redirect [webmasterworld.com] is horribly complicated due to a long-standing bug in Apache mod_rewrite, and is therefore difficult to maintain.

However, I would like to add some comments to this code:


Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect direct client requests for "/index.htm" to "/" in
# canonical domain (This applies to /index pages in any directory)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes at beginning or end of URL
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes embedded in URL
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
# Externally redirect non-canonical domain requests to canonical domain.
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim
9:45 pm on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Very useful. A thread for the library methiinks.
9:46 pm on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A side note, if your images aren't showing up, you should get into the habit of using absolute paths (/images/whatever.jpg) and not relative paths (../images/whatever.jpg or images/whatever.jpg).