| This 37 message thread spans 2 pages: < < 37 ( 1  ) || |
|index.php redirect breaking 404 pages.|
I have recently setup a index.php redirect through .htaccess. The idea here is to negate duplicate content issue that crops up when a site has both an index.php and / (homapage) getting indexed.
I used the technique listed here.
It works great too. The one issue is, it breaks the 404 pages.
So if a user types in or goes to www.example.com/dafjkadbfda instead of serving the 404 page, what happens is the URL stays the same, in this case the broken one, and it severs the index.php page.
This in turn is opening another can of worms in that all those broken pages are coming up as duplicate content and meta. So while this is somewhat seo related, it does have to deal with the .htaccess. :) This has been an issue on many sites that I thought the .htaccess redirect worked on. Thanks in advance.
|msg:4526235 [webmasterworld.com] mentioned the path part of the 3 URLs, but didn't clarify the requested hostname for the middle step. |
It's the yellow one halfway down the page ;)
Isn't the point that it's supposed to work cleanly with any hostname?
Oh, and I just realized:
RewriteRule ^index\.php(/(.*))?$ http://www.example.com/$1 [R=301,L]
index.php(/blahblahhere) >> http://www.example.com//blahblahhere
index.php(/) >> http://www.example.com//
Get rid of that slash in the target.
rewriterule ^index\.php(/(.*))?$ http://www.example.com$1 [R=301,L]
in the posted code. with target as ending .com$1
I would use
RewriteRule ^index\.php(/(.*))?$ http://www.example.com/$2 [R=301,L]
here for clarity. This also means the target always correctly ends with a trailing slash when $2 is empty.
Additionally, there's a typo in the non-www/www redirect, the
rewriterule (.*) http://www.example.com/$2 [R=301,L]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
as $2 will always be empty.
I fixed the typo, thanks g1smd, and I've tried the changes you mentioned Lucy but nothing changes. For simplicity I removed both the index.php/ and www arguments completely leaving ONLY the one line rewriterule and that rule fails. The problem is with that, somehow. e.g.
RewriteRule ^some-old-url$ http://www.example.com.com/my-new-url [R=301,L]
visiting www.example.com/index.php/some-old-url results in a 404 error.
visiting www.example.com/anything-here/some-old-url ALSO fails to redirect.
visiting www.example.com/some-old-url results in a 301 redirect to the right page.
I thought that the rewriterule above was supposed to capture any url ending in some-old-url but when there is a directory in the url it doesn't? So right now my site needs to redirect to remove the index.php/ first and then the first rule works so it immediately does that too.
RewriteRule ^index.php/some-old-url$ http://www.example.com.com/my-new-url [R=301,L]
works so which catchall should I use to consider the index.php/ without opening other cans of worms?
I ordered an .htaccess book for myself from Amazon btw - I'm not sure if that will be a present or punishment :)
edit: the following works but can it be improved?
RewriteRule ^(.*/)?some-old-url$ http://www.example.com.com/my-new-url [R=301,L]
would it be more efficient to use
RewriteRule ^(index.php\/)?some-old-url$ http://www.example.com.com/my-new-url [R=301,L]
|I thought that the rewriterule above was supposed to capture any url ending in some-old-url |
Ah ha! You've got a beginning anchor. If you want to look only at the end of the URL, regardless of what comes before it, you have to leave off the anchor.
And don't bother about ^(.*/) because g1 will tear your head off and it isn't worth it :) Luckily there are approximately ten thousand earlier posts in this forum showing the correct way to capture the beginning of an URL if you need to save the part before your target text. Here maybe you don't, so just omitting the anchor is enough.
(.*) at the beginning of the pattern. The
^(.*) means read the entire URL all the way to the very end. Replace with
^(([^/]+/)*)index\.php to match and capture optional folder levels.
RewriteRule ^thispage$ -- matches a request for
If you want to match a request for
example.com/index.php/thispage you will need
^ means "begins with". It's good to include it, otherwise the rule might match other URL requests that it should not do so.
AcceptPathInfo enabled or disabled on this site?
Thanks guys(and gals).
Oh and AcceptPathInfo is disabled.
| This 37 message thread spans 2 pages: < < 37 ( 1  ) |