Welcome to WebmasterWorld Guest from 54.196.244.186

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Page Indexed with double //

How to prevent

     
4:30 pm on Aug 3, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


My sites ULR's are eg www.mysite.com/fluff/

I recently notice that Google has indexed www.mysite.com/fluff//

It also seems that despite the amount of inbounds pointing to www.mysite.com/fluff/ Google has chosen to rank www.mysite.com/fluff//

The page with two trailing slashes has been indexed and has PR. If I navigate to this link most of the images are broken. Looks really bad.

How do I stop Google indexing multiple forward slashes?

Thanks in advance

[edited by: Pass_the_Dutchie at 4:31 pm (utc) on Aug. 3, 2007]

4:48 pm on Aug 3, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Apache treats multiple slashes as single slashes.

You can 301 redirect double-slashed URLs to the correct URL using mod_rewrite in .htaccess:


RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

or using RedirectMatch:

RedirectMatch 301 ^(.*)//+(.*)$ http://www.example.com/$1/$2

Either of these snippets will replace two or more contiguous slashes with a single slash.

Jim

6:49 pm on Aug 3, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


"How do I stop Google indexing multiple forward slashes?"

by locating the broken link that is causing it!
Try searching your html for "<a href=""><a/>" or blank links.

Good luck on html searches for "//"!

10:20 am on Aug 5, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Thanks for the feedback.

All internal links pointing to this page are correct. The page has PR5 and has lots of quality links pointing to it. However, one of these links from a directory site, which has grey PR is linking to the site with // at the end of the URL. Crazy! Why would Google prefer to index the page with the double slashes over all the internal links and several external links?

jdMorgen - I tried your suggestions but there seems to be a conflict with the current code.

Here is the entire code on the .htaccess file:
-----------------------------
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.htm\ HTTP/
RewriteRule ^(([^/]*/)*)index\.htm$ [c...] [R=301,L]

RewriteCond %{REQUEST_URI} ^(.*)//+(.*)$
RewriteRule / http://www.example.com/%1/%2 [R=301,L]

-----------------------------------------------

This now results in : http://www.example.com//fluff/

It puts double slashed in the middle of the URL.

The same thing happens if I use the RedirectMatch code.

Please could you let me know what I am doing wrong.

Many thanks in advance.

Dutchie

7:52 am on Aug 7, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Hi,

Don't like to bump but I really need some help on this one. Pages from our sites are droping like flies and it may be due to a competitor linking to internal pages with // at the end of the URL which is causing duplication issues in Google.

Please could someone look at the .htaccess source above and show me (and no doubt others) how to stop the indexing of multiple forward slashes in a URL.

Thanks for your reply.

D

1:31 pm on Aug 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Well try this. It's a bit of a hack, but it might work better... Also, your rules should be in order, from most-specific to least-specific:

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim
3:53 pm on Aug 7, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Thanks Jim. This has solution has solved the problem of the double slash.
4:27 pm on Aug 8, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Hi me again,

I just noticed that this source works until you have:

www.example.com/fluffy/dice//

The above .htaccess does not deal with second tier double slashes. I tried ammeding your code but I could not get it to work.

Please advise.

Many thanks once again.

D

5:17 pm on Aug 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0



Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [b][OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$[/b]
RewriteRule ^/ http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Please take care to avoid double-posting your replies. I've had to remove one triple-post and two double-posts so far.

Thanks,
Jim

7:06 pm on Aug 8, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Yea, sorry about the double posting, browser kept hanging.

with regard to
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
is there any difference between

^example\.com [NC] and ^example.com [NC]

7:26 pm on Aug 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


An unescaped "." in a pattern is a regular-expressions token matching any single character.

RewriteCond %{HTTP_HOST} ^example\.com [NC]
Hostname must start with "example", followed by a literal period, followed by "com", match is case-insensitive.

RewriteCond %{HTTP_HOST} ^example.com [NC]
Hostname must start with "example", followed by any single character, followed by "com", match is case-insensitive.

In this code, it makes no practical difference. However, the period should be escaped as a 'best practice' so that the code is clear and correct, and to reinforce the habit so that you will escape periods when it is critically important to do so.

Jim

7:24 am on Aug 9, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


Reason I ask is because many other examples I have found on the net do not escape the period. I think I will choose the method of best practice :)

Back to the code......

I have changed the .htaccess file as suggested but I still cant seem to redirect the // on a second tier link e.g www.example.com/remove/slashes//

I have copied it as it is then I tried using either

RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$

No joy.

Am I missing somthing?

4:27 pm on Aug 9, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Jeez, this should not be so hard...

How about this:


RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^/(([^/]+/)*)/+$
RewriteRule ^/ http://www.example.com/%1 [R=301,L]

Be sure that you're completely flushing your browser cache after makinf any change to the .htaccess file(s) on your server.

Jim

8:11 pm on Aug 9, 2007 (gmt 0)

New User

10+ Year Member

joined:Jan 1, 2004
posts:33
votes: 0


From the previous post, it seems we're in .htaccess, so would I be right in thinking that:

rewriterule ^/ whatever

will only trip if the double slash is before the first directory?

Peter

2:57 am on Aug 10, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Well-spotted, Peter!

Focusing on the patterns as they changed and progressed, I missed that point. Thanks for the help!

So, with the start-anchor removed from the RewriteRule pattern, the simpler RewriteCond pattern from several posts previous should work:


RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]

Jim

3:46 pm on Aug 10, 2007 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 19, 2004
posts:374
votes: 0


I just updated the .htaccess file and the following code re-directs non-www to www, removes /index.htm, and removes multiple / slashes from both within and at the end of the URL :)

----------------------

Options +FollowSymLinks
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

----------------------

IMO this is an important .htaccess addition particularity for ULRs that end in /

Jim, many thanks for your expertise and patience and Pete thanks for your heads up.

3:41 pm on Aug 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Are there any input URLs that could cause a double or treble redirect before the browser reaches the final target URL?

I assume a non-www named index file in a folder with double slash might do that?

9:39 pm on Aug 28, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Yes, it's possible. But combinations of errors in links should be relatively rare, and the code to handle all possibilities with a single redirect [webmasterworld.com] is horribly complicated due to a long-standing bug in Apache mod_rewrite, and is therefore difficult to maintain.

However, I would like to add some comments to this code:


Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect direct client requests for "/index.htm" to "/" in
# canonical domain (This applies to /index pages in any directory)
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes at beginning or end of URL
RewriteCond %{REQUEST_URI} ^//+(.*)$ [OR]
RewriteCond %{REQUEST_URI} ^(.*/)/+$
RewriteRule / http://www.example.com/%1 [R=301,L]
#
# Externally redirect to remove multiple contiguous slashes embedded in URL
RewriteCond %{REQUEST_URI} ^/([^/]+)//+(.*)$
RewriteRule // http://www.example.com/%1/%2 [R=301,L]
#
# Externally redirect non-canonical domain requests to canonical domain.
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Jim
9:45 pm on Aug 28, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Very useful. A thread for the library methiinks.
9:46 pm on Aug 28, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 13, 2005
posts:1077
votes: 0


A side note, if your images aren't showing up, you should get into the habit of using absolute paths (/images/whatever.jpg) and not relative paths (../images/whatever.jpg or images/whatever.jpg).