Forum Moderators: phranque

Message Too Old, No Replies

double slashes

         

wilderness

6:28 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Could somebody provide and/or explain how to modify this regardless of the directory or sub-directory locations of the slashes?

^(/[^/]+/)/+(.*)$

Wish to include all these possibilities.
EX:
//directory/
/directory//sub-directy/
/directory/sub-directy//sub-sub-drecty/
/directory/sub-directy/sub-sub-drecty//sub-sub-sub/

wilderness

7:07 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^(/[^/]+/)/+(.*)$

variable
forward slash
followed by not a forward slash
one or more "followed by not a forward slash"
forward slash

match

this part I find confusing|
Not sure if it's matching the last forward slash or matching the entire group of previous six lines|

variable
any character
end of line

I have to look up and interpret each of these characters, while never grasping any of it, and further more, never retaining a functional use of the syntax.

Andy Langton

7:21 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



To match a double slash anywhere in a string would normally be more straightforward:

.*//.*

Any character (.) that occurs zero or more times (*), followed by two forward slashes (//), followed by any character (.) that occurs zero or more times (*).

If this is intended for mod_rewrite in an htaccess file, then you would need to make use of parentheses to capture those "any character" patterns, ending with something like the below:

RewriteRule (.*)//(.*) http://www.example.com/$1/$2 [R=301,L]

This won't work in mod_rewrite, however, I believe because Apache already "cleans" the double slashes internally. So instead, you need a rule to trigger on the requested URL:

RewriteCond %{REQUEST_URI} (.*)//(.*)
RewriteRule .* %1/%2 [R=301,L]

I imagine this rule is pretty "expensive", so perhaps there might be some more efficient suggestions.

wilderness

7:29 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



thanks.

It's not a string.
These are directory structures with sub, sub-sub and sub-sub-sub direcories, all followed by html pages.

I have the line I provided in place and using the URI, however they won't catch the the double slashes preceding the 3rd sub-directory, and likely NOT any double slashes preceding a 4th or 5th.

wilderness

7:35 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, these lines are only in place to stop google from its "loop" and "chasing its own tail". Otherwise attempting to duplicate content.

All the formerly bad links in the pages were repaired a month ago. ggl keeps adding more amd more slashes, while Bing and the other SE's don't even request them.

Andy Langton

7:54 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's not a string.


Ah, that's just a terminology thing. A regular expression matches a text "string" - even if that string happens to be a URL ;)

In terms of the original pattern, I believe it won't match after a certain number of slashes, because it requires only one directory at the start of the pattern (you can ignore the parentheses in the original pattern for now):

^ (starts with)
/ (forward slash)
[^/]+ (not a forward slash, one or more times)
/ (a slash)
/+ (a slash, one or more times)

So, /this//subdirectory/ matches, but /this/subdirectory// doesn't.

The added difficulty is that you will need to capture the text before and after the doubleslash to remove it, hence the suggestion that you simply put a wildcard and the start of the pattern:

(.*)//(.*)

This essentially replaces "not a forward slash, one or more times" with any character, and removing the need for a "normal" directory immediately prior to the double slash.

Phew, hope that makes some kind of sense - describing regular expressions is tricky!

lucy24

10:37 pm on Apr 12, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



.*//.*

I'm surprised g1 hasn't swung by to throw a fit about that leading .* yet :)

.* doesn't mean "non-slash", it means "anything in the world, so long as there are at least two slashes after it". So it would match

abc///////////////def

but only the final slash in that long string would go away.

The icky messy part is the first one in the OP:

www.example.com//morestuff

because you can't be sure what will come through in your htaccess. (Long exhausting discussions elsewhere in these Forums having to do with presence or absence of leading slash.)

You can write htaccess rules for a single occurrence of //+ but if you're getting requests with lots of multi-slashes, it's either php-script time or resign yourself to multiple Redirects. Insert quote from Mae West about lesser of two evils.


Here's my entirely different version. I used .com in the Condition as a stand-in for whatever your domain name really ends in. You need it so you can avoid the one place where there are supposed to be two // slashes-- without having to spell out the whole tedious Request.

RewriteCond %{THE_REQUEST} \.com/((?:[^/\ ]+/)*)/+((?:[^/\ ]+/)*[^/\ ]*)
RewriteRule (^/|//) http://www.example.com/%1%2 [R=301,L]

If it turns out that everything is hitting your htaccess with leading slash, get rid of that side of the Rule and reduce it to // alone.

The leading slash as in www.example.com// is not a serious worry anyway, unless g### picks it up and starts yattering about Duplicate Content. Unlike multi-slashes deeper into the URL, it will simply be ignored and treated as a single slash. (I know this because by pure coincidence I recently had a request in this form, and had to figure out why my log-wrangling routine missed 60-odd illustrations that it should have ignored.)

wilderness

4:37 pm on Apr 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FWIW, the following in place seems to have brought a quick halt to google grabbing the multiple slashes (despite not working on any slashes that are three or more directories deep).
The first two directories seem to have been enough of a notification and google is accepting the rediredts.

RewriteCond %{REQUEST_URI} ^(/[^/]+/)/+(.*)$
RewriteRule ^. http://www.example.com%1%2 [R=301,L]

The original syntax (below link) offered a second RewriteCond %{REQUEST_URI}, which I never could get to function as intended, despite multiple attempts.

old thread and response from Jim [webmasterworld.com], im which I used the single line


FWIW this thread and reply from Jim, also answers lucy's previous question in another thread concerning {n,n}


Many thanks to all for the help.

Don