Forum Moderators: phranque

Message Too Old, No Replies

301 Redirect Conflicting With phpBB3-SEO-Premod Rewrite

301 Redirect Conflicting With phpBB3-SEO-Premod Rewrite

         

NocturneEyes

8:40 am on Jun 6, 2012 (gmt 0)

10+ Year Member



Hello All,
I just installed phpBB3 and phpBB3-SEO-Premod last night and it is producing 404 errors. It has pretty much been determined that the problem is because I just recently converted my html site (almost 500 pages) over to php and into WordPress and I therefor setup a 301 redirect in my htaccess file, which is as follows:

RewriteRule ^(.*).htm /$1.php [R=301,L]


The phpBB3-SEO-Premod renders .html extensions, so apparently my 301 redirect is making that impossible. So, is there any sort of special rewrite rule or anything you guys might know of that would allow phpBB3-SEO-Premod to do its thing?

Any help or suggestions guys might have would be greatly appreciated.

Thanks.

~ Aedryan

g1smd

8:59 am on Jun 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your pattern does not have an end anchor so matches any request that "begins" <anything>.htm and so also matches requests ending ".html" or ".htm<anything>"

You should add the end anchor so that the RegEx is not looking for optional characters after "htm".

Additionally, (.*) is absolutely the wrong pattern to use here. Never use (.*) at the beginning or in the middle of a pattern.

A wider question has to be why have the redirect at all? You could have kept on using the exact same URLs and extensions as before with the appropriate site configuration in place.

[edited by: g1smd at 9:04 am (utc) on Jun 6, 2012]

NocturneEyes

9:04 am on Jun 6, 2012 (gmt 0)

10+ Year Member



Thanks so much for your help and please pardon my ignorance, but I don't understand what you said. Could you please be more specific? I'm pretty new to the whole rewrite/redirect thing, as i'm just an old html web designer from way back, trying to get the hang of this whole CMS way of doing things... lol

NocturneEyes

9:11 am on Jun 6, 2012 (gmt 0)

10+ Year Member



PS You are right, I very well could have just configured WordPress initially to render .htm instead of .php and kept my URLs the same. At the time, I just figured .php would have given me more flexibility. But, I guess it doesn't really make much of a difference, as it turns out.

If you can give me a workable/understandable solution for this phpBB issue, I would be very grateful...

lucy24

4:44 pm on Jun 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



^(.*).htm


means "If there is a request for anything containing the sequence 'htm' with at least one preceding character, capture the beginning-- if any-- and reuse it."

blahblah.htm > (blahblah).htm > blahblah.php

BUT ALSO

blahblahhtm > (blahbla)hhtm > blahbla.php

blahblah.html > (blahblah).html > blahblah.php

or (google tried this on me recently)

blahblah.html1 > (blahblah).html1 > blahblah.php

blahblahtm > (blahbl)ahtm > blahbl.php

blahbla.htmhtm > (blahbla.ht)mhtm > blahbl.ht.php

or (read this slowly)

ou
ghtmybrowsertodothis.jsp > (ou)ghtmybrowsertodothis.jsp > ou.php

And so on. You get the idea.

You've got two syntax problems that may be entirely separate from what you're really trying to do. Either one can interfere with testing whether your code works as intended.

#1 No closing anchor means that the sequence "htm" just has to occur somewhere in the request.

#2 Un-escaped period means that "htm" can be preceded by any character. If you want a literal period in the pattern, it has to be \.

So this is where you backtrack and figure out in English what you want to do. Start by establishing exactly what happens if you do nothing.

NocturneEyes

5:11 pm on Jun 6, 2012 (gmt 0)

10+ Year Member



Thanks so much for such a detail reply, Lucy. I really appreciate that. I pretty much get what you're saying and I will definitely be trying to wrap my mind around all of it.

Can you please suggest a code other than:

RewriteRule ^(.*).htm /$1.php [R=301,L]


... that I should have in my htaccess?

Just so you understand its purpose, I just finished converting all of my websites .htm pages (almost 500 of them) over to WordPress and all of my extensions are now .php. The 301 redirect I have in my htaccess now is working, because a ton of my new url's have already been indexed.

I would very much appreciate your thoughts on this.

Thanks again you guys!

~ Aedryan

lucy24

7:46 pm on Jun 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Your rule as written will work on correctly formed URLs-- but it may also work on malformed ones, as well as htm/html duplicates. You don't want that. And the (.*) makes the server do extra work.

To get an optimal rule you need to change three things:

--escape the period to \.

--include a closing anchor $

--By default, Regular Expressions will capture as much as they can, starting as soon as they can and going on as long as they can. So paradoxically the opening anchor is unnecessary, though in this case it won't do any harm. But you need to replace the (.*) with something that will make the server stop before it reaches the .htm so it doesn't have to backtrack:

blahblah.htm > (blahblah.htm) > Oh, oops, I'm supposed to stop capturing [i]before[/i] ".htm" > (blahblah.ht)m > Nope, still not right > (blahblah.h)tm


... and so on.

It may all happen in nanoseconds, but it's work the server shouldn't have to do. It may help to understand that the server sees this stuff in one dimension. Your eyeballs, working in two dimensions, tell you there's an ".htm" coming up, but the server doesn't know until it gets there.

There are several thousand posts in this Forum detailing how to capture different numbers of directories. Luckily you don't have to worry about that. All you need is

[^.] meaning "capture everything that isn't a period". Note that since it is inside brackets, the . does not need to be escaped.

Since most URLs only contain one period (your domain name doesn't count), you are home free.

Now let's see you put it together ;)

NocturneEyes

8:42 pm on Jun 6, 2012 (gmt 0)

10+ Year Member



Wow... Errr... Ummm... I feel really stupid! lol

How about:

RewriteRule [^.] /$1.php [R=301,L]
?

g1smd

9:23 pm on Jun 6, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$1 will be empty. You didn't "capture" anything to put in it.

Even if you had, you'd have an infinite redirect loop because your pattern matches "a character", in fact "any character". The redirected-to URL will re-match the pattern and redirect again.

Your pattern will need to match "something" followed by "period" and "htm".

The rule target should contain the full protocol and domain as well as the path.

You should read a good tutorial on Regular Expressions. You can't guess this code.

NocturneEyes

10:18 pm on Jun 6, 2012 (gmt 0)

10+ Year Member



I'm definitely not a coder by any means. I wish I was! >8>( I know HTML and flash, some SSI and very little PHP. Other than that, i'm simply a hack... lol This time around, I don't have a choice, but to get more proficient at PHP. I do know SEO like nobody's business, though! >8>)

You guys sure love to toy with us nOOBs, don't you? lol (totally appreciate what you're doing, though!)

Let's see... How 'bout:

RewriteRule [^.]\.htm$.php [R=301,L]
?

lucy24

12:21 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ouch! Does that get you a 500 error or does it simply not run?

Or, oops, uh, was that final question mark meant as part of your post, not part of the RewriteRule? :)

Well, n/m, it's got an earlier booboo anyway. Apache mods tend to throw fits when there's the wrong number of spaces in the wrong places.

Look upon this picture

(.*).htm $1.php [R=301,L]

and on this

[^.]\.htm$.php [R=301,L]

and combine the correct bits of each, keeping in mind that your original version was not flat-out wrong, just flawed.

I started out here
[regular-expressions.info...]
and I think it took me a year to get to the end of the first page. But Regular Expressions do not leave much wiggle room, so almost any information source should work. ("Regular Expressions for Dummies" would be a very, very short book.)

NocturneEyes

2:03 am on Jun 7, 2012 (gmt 0)

10+ Year Member



If I were a bot, I would rather follow something like: (.)(.) But, anyway... >8>)=

*AHEM* Ummm... It has to be either ...

[^.].htm$.php [R=301,L] -OR- [^.]\htm$.php [R=301,L]

I'm still confused about where you said, "--escape the period to \."...

[^.].htm$.php [R=301,L] makes more sense to me, so imma go with that!

Thanks for the link, by the way. I took a quick look at it and bookmarked it. That will definitely become a nice resource!

lucy24

4:29 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Escape" in RegEx-speak means put a \ in front of the character. It's necessary with anything that has a special meaning, to show that you mean the literal character. For example:

(blahblah) = captured group containing "blahblah"
\(blahblah\) = the literal string "(blahblah)" --for example in user-agents.

. = any one character
\. = a literal period or dot, accept no substitutions

Anyway, you seem to have misplaced your mod_rewrite [httpd.apache.org] syntax although it started out just right. It goes:

RewriteRule pattern-with-optional-captures-and-anchors target-optionally-reusing-captured-stuff [optional-flags]

Each of those spaces has semantic meaning: "That was one part of the rule, now we move on to the next part". So if you ever needed to make a rule about something containing literal spaces-- again, user agents are the likeliest-- you would have to "escape" the space itself. This is specific to mod_rewrite, not part of the RegEx standard.

The target of a redirect should always begin with the full protocol and domain. You can go merrily along for years ignoring this rule, especially if you are on shared hosting and you've asked them to redirect to your preferred sitename (with/without www.) before the request ever reaches your htaccess. But again, to be safe:

http://www.example.com/$1.php [R=301,L]

Yes, just to confuse you, the $ has two separate and unrelated meanings. In the pattern it's an ending anchor; in the target it represents a capture. You can have up to nine of 'em, though this would not be a pretty sight.


Urk. I just realized I may have said something that misled you.
[^.] meaning "capture everything that isn't a period"

By itself, [blahblah] with something inside of brackets means "capture one of whatever it is". To capture more than one, you still need some kind of quantifier:
* = zero or more, as many as you can gobble
+ = one or more, ditto, but there has to be at least one
{2} and similar forms = this exact number of items

Also: brackets don't mean "skip anything that doesn't fit". They mean "stop when you get to something that doesn't fit". A pattern you will see often is the one for capturing some number of directories:

([^/]+/)

So [^.]+ means "gobble away, but stop the moment you hit a period".

NocturneEyes

2:24 pm on Jun 7, 2012 (gmt 0)

10+ Year Member



So, are you saying it should be:

RewriteRule http://www.example.com/$1.php [R=301,L]

lucy24

6:14 pm on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Almost there. Your target looks exactly the way it's supposed to, but you've misplaced the pattern. (Ask the cat. She probably hid it under the bed.)

NocturneEyes

6:33 pm on Jun 7, 2012 (gmt 0)

10+ Year Member



LOL! So, it has to be either:

RewriteRule http://www.example.com/[^.]$1.php [R=301,L]

-OR-

RewriteRule http://www.example.com/[^.].htm$1.php [R=301,L]

I'm thinking:

RewriteRule http://www.example.com/[^.]$1.php [R=301,L]

Just as a side note, it almost seems to me like it should be: [.^] instead of [^.], but I know that's not right. Just seems like that would tell it to "capture everything AFTER the period in the extension"...

g1smd

8:10 pm on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Inside [ ] the ^ means "NOT".

Examine recent threads with code for redirects (there are many). The overall format will be clear to see.

RewriteRule pattern target [flags]

lucy24

10:40 pm on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: whapping self upside of head with wet haddock ::

###. I keep leaving things out because I assume you know stuff that of course you don't.

[brackets] by themselves don't mean a capture. They just define a group:
[blah] = any of the letters b,l,a,h
[^blah] = anything other than the letters b,l,a,h (note that the caret has two unrelated meanings depending on whether it's used inside brackets or at the beginning of an expression)

To capture, you still need to put the brackets inside parentheses:

(blah) = capture one occurrence of "blah"
((blah)+) = same, with any number of blahs

([blah]) = capture any one letter in the set b,l,a,h
([blah]+) = capture as many b,l,a,h as you meet

([^blah]) = capture the first thing you meet that isn't b or l or a or h
([^blah]+) = same, but continue until you crash into one of the Forbidden Four

^[blah] = this rule (or Condition) will only apply if the text you're looking at starts with b or l or a or h
^[^blah] = same, but text has to start with anything other than the Forbidden Four

Is the head spinning yet?

NocturneEyes

11:03 pm on Jun 7, 2012 (gmt 0)

10+ Year Member



RewriteRule http://www.example.com/[^.htm]$1.php [R=301,L]

*Ducks & tries to cover all of his vital organs*

NocturneEyes

11:06 pm on Jun 7, 2012 (gmt 0)

10+ Year Member



Errr... [^.vital organs]$...

g1smd

11:13 pm on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



del...

[edited by: g1smd at 11:14 pm (utc) on Jun 7, 2012]

g1smd

11:14 pm on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule pattern target [flags]


pattern
- the pattern to match the requested URL.

target
- the new URL you're redirecting to.

lucy24

6:10 am on Jun 8, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Can you remember back to your original post? Structurally you had it; it just needed fine-tuning.

RewriteRule ^(.*).htm /$1.php [R=301,L]


RewriteRule
--check

^(.*).htm
--"pattern", aka the part you want to replace
--flawed by
#1 superfluous opening anchor
#2 undesirable .* in non-final position
#3 non-escaped period and
#4 lack of closing anchor

/$1.php
--"target" aka what you're replacing the pattern with
--check, because this part never had anything wrong with it

[R=301,L]
--flags, as appropriate
--check, ditto

NocturneEyes

7:08 am on Jun 8, 2012 (gmt 0)

10+ Year Member



RewriteRule http://www.example.com/[^.].htm /$1.php [R=301,L]...

lucy24

7:16 am on Jun 8, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Magic Word #1: capture

Magic Word #2: escape

Oh, and oops. The protocol-plus-domain goes ONLY in the target. Omitting it from the target is iffy but not fatal; including it in the pattern is fatal. mod_rewrite thinks you are looking for

http://www.example.com/http://www.example.com/[^.].htm

Take your time. I gotta go clean the rat cage.

NocturneEyes

3:14 pm on Jun 8, 2012 (gmt 0)

10+ Year Member



RewriteRule http://www.example.com/[^.]\.htm /$1.php [R=301,L]