homepage Welcome to WebmasterWorld Guest from 54.196.62.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Page Indexed with double //
How to prevent
Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 4:30 pm on Aug 3, 2007 (gmt 0)

My sites ULR's are eg www.mysite.com/fluff/

I recently notice that Google has indexed www.mysite.com/fluff//

It also seems that despite the amount of inbounds pointing to www.mysite.com/fluff/ Google has chosen to rank www.mysite.com/fluff//

The page with two trailing slashes has been indexed and has PR. If I navigate to this link most of the images are broken. Looks really bad.

How do I stop Google indexing multiple forward slashes?

Thanks in advance

[edited by: Pass_the_Dutchie at 4:31 pm (utc) on Aug. 3, 2007]

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 4:48 pm on Aug 3, 2007 (gmt 0)

Apache treats multiple slashes as single slashes.

You can 301 redirect double-slashed URLs to the correct URL using mod_rewrite in .htaccess:

RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

or using RedirectMatch:

RedirectMatch 301 ^(.*)//+(.*)$ http://www.example.com/$1/$2

Either of these snippets will replace two or more contiguous slashes with a single slash.

Jim

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3412891 posted 6:49 pm on Aug 3, 2007 (gmt 0)

"How do I stop Google indexing multiple forward slashes?"

by locating the broken link that is causing it!
Try searching your html for "<a href=""><a/>" or blank links.

Good luck on html searches for "//"!

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 10:20 am on Aug 5, 2007 (gmt 0)

Thanks for the feedback.

All internal links pointing to this page are correct. The page has PR5 and has lots of quality links pointing to it. However, one of these links from a directory site, which has grey PR is linking to the site with // at the end of the URL. Crazy! Why would Google prefer to index the page with the double slashes over all the internal links and several external links?

jdMorgen - I tried your suggestions but there seems to be a conflict with the current code.

Here is the entire code on the .htaccess file:
-----------------------------
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.htm\ HTTP/
RewriteRule ^(([^/]*/)*)index\.htm$ [c...] [R=301,L]

RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

-----------------------------------------------

This now results in : http://www.example.com//fluff/

It puts double slashed in the middle of the URL.

The same thing happens if I use the RedirectMatch code.

Please could you let me know what I am doing wrong.

Many thanks in advance.

Dutchie

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 7:52 am on Aug 7, 2007 (gmt 0)

Hi,

Don't like to bump but I really need some help on this one. Pages from our sites are droping like flies and it may be due to a competitor linking to internal pages with // at the end of the URL which is causing duplication issues in Google.

Please could someone look at the .htaccess source above and show me (and no doubt others) how to stop the indexing of multiple forward slashes in a URL.

Thanks for your reply.

D

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 1:31 pm on Aug 7, 2007 (gmt 0)

Well try this. It's a bit of a hack, but it might work better... Also, your rules should be in order, from most-specific to least-specific:

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 3:53 pm on Aug 7, 2007 (gmt 0)

Thanks Jim. This has solution has solved the problem of the double slash.

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 4:27 pm on Aug 8, 2007 (gmt 0)

Hi me again,

I just noticed that this source works until you have:

www.example.com/fluffy/dice//

The above .htaccess does not deal with second tier double slashes. I tried ammeding your code but I could not get it to work.

Please advise.

Many thanks once again.

D

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 5:17 pm on Aug 8, 2007 (gmt 0)


Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [b][OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$[/b]
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Please take care to avoid double-posting your replies. I've had to remove one triple-post and two double-posts so far.

Thanks,
Jim

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 7:06 pm on Aug 8, 2007 (gmt 0)

Yea, sorry about the double posting, browser kept hanging.

with regard to
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
is there any difference between

^example\.com [NC] and ^example.com [NC]

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 7:26 pm on Aug 8, 2007 (gmt 0)

An unescaped "." in a pattern is a regular-expressions token matching any single character.

RewriteCond %{HTTP_HOST} ^example\.com [NC]
Hostname must start with "example", followed by a literal period, followed by "com", match is case-insensitive.

RewriteCond %{HTTP_HOST} ^example.com [NC]
Hostname must start with "example", followed by any single character, followed by "com", match is case-insensitive.

In this code, it makes no practical difference. However, the period should be escaped as a 'best practice' so that the code is clear and correct, and to reinforce the habit so that you will escape periods when it is critically important to do so.

Jim

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 7:24 am on Aug 9, 2007 (gmt 0)

Reason I ask is because many other examples I have found on the net do not escape the period. I think I will choose the method of best practice :)

Back to the code......

I have changed the .htaccess file as suggested but I still cant seem to redirect the // on a second tier link e.g www.example.com/remove/slashes//

I have copied it as it is then I tried using either

RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$

No joy.

Am I missing somthing?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 4:27 pm on Aug 9, 2007 (gmt 0)

Jeez, this should not be so hard...

How about this:

RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^/(([^/]+/)*)/+$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]

Be sure that you're completely flushing your browser cache after makinf any change to the .htaccess file(s) on your server.

Jim

Peter

10+ Year Member



 
Msg#: 3412891 posted 8:11 pm on Aug 9, 2007 (gmt 0)

From the previous post, it seems we're in .htaccess, so would I be right in thinking that:

rewriterule ^/ whatever

will only trip if the double slash is before the first directory?

Peter

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 2:57 am on Aug 10, 2007 (gmt 0)

Well-spotted, Peter!

Focusing on the patterns as they changed and progressed, I missed that point. Thanks for the help!

So, with the start-anchor removed from the RewriteRule pattern, the simpler RewriteCond pattern from several posts previous should work:

RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]

Jim

Pass the Dutchie

10+ Year Member



 
Msg#: 3412891 posted 3:46 pm on Aug 10, 2007 (gmt 0)

I just updated the .htaccess file and the following code re-directs non-www to www, removes /index.htm, and removes multiple / slashes from both within and at the end of the URL :)

----------------------

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

----------------------

IMO this is an important .htaccess addition particularity for ULRs that end in /

Jim, many thanks for your expertise and patience and Pete thanks for your heads up.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 3:41 pm on Aug 24, 2007 (gmt 0)

Are there any input URLs that could cause a double or treble redirect before the browser reaches the final target URL?

I assume a non-www named index file in a folder with double slash might do that?

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 9:39 pm on Aug 28, 2007 (gmt 0)

Yes, it's possible. But combinations of errors in links should be relatively rare, and the code to handle all possibilities with a single redirect [webmasterworld.com] is horribly complicated due to a long-standing bug in Apache mod_rewrite, and is therefore difficult to maintain.

However, I would like to add some comments to this code:

Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect direct client requests for "/index.htm" to "/" in
# canonical domain (This applies to /index pages in any directory)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes at beginning or end of URL
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes embedded in URL
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
# Externally redirect non-canonical domain requests to canonical domain.
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3412891 posted 9:45 pm on Aug 28, 2007 (gmt 0)

Very useful. A thread for the library methiinks.

carguy84

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3412891 posted 9:46 pm on Aug 28, 2007 (gmt 0)

A side note, if your images aren't showing up, you should get into the habit of using absolute paths (/images/whatever.jpg) and not relative paths (../images/whatever.jpg or images/whatever.jpg).

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved