Forum Moderators: phranque

Message Too Old, No Replies

Strip Everything After .html

         

oliversk

5:26 pm on Sep 10, 2011 (gmt 0)

10+ Year Member



Hi,

I'm new here. I've found some nice solutions on this site, but was unable to find one for my problem.

Maybe you have a rewrite rule that can strip everything after .html:

1. Rewrite mypost.html)anycharacters mypost.html

My first try:
 RewriteRule ^(.*)\.html\"(.*)$ http://domain.com/$1.html? [R=301,L]


2. How would I rewrite missing...operating.html to missing-operating.html - I get a lot of problems when I try using quotes

Wrong:
 Redirect permanent "/missing-...-operating. html" /missing-operating.html

Any help appreciated.

Thanks
Oliver

g1smd

5:30 pm on Sep 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When you say "anything after .html" do mean more stuff in the path or do you mean an appended query string? They are each handled completely differently.

oliversk

5:33 pm on Sep 10, 2011 (gmt 0)

10+ Year Member



I already have a rule to strip query strings, I need one to strip characters

E.g.
.htmlssss
.html)
.html/thisisabacklink
.html width=www

oliversk

5:35 pm on Sep 10, 2011 (gmt 0)

10+ Year Member



Here's my rule to strip query strings. Very useful for Wordpress + W3TC


RewriteCond %{QUERY_STRING} . [NC]  
RewriteCond %{QUERY_STRING} !^(s|p|cx|cat|tag|doing_wp_cron|page_id|w3tc_rewrite_test|w3tc_preload)=.*
RewriteCond %{REQUEST_URI} !.*wp-admin.*
RewriteRule ^(.*)$ /$1? [R=301,L]

oliversk

6:33 pm on Sep 10, 2011 (gmt 0)

10+ Year Member



Well, I guess you could do it like this:

RewriteRule ^(.*)\.html([\s]|\:|\))(.*)$ http://domain.com/$1.html? [R=301,L]


But then you would have to list every special character like ":, ), (, %, &,"

g1smd

6:37 pm on Sep 10, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Leading (.*) says "grab the entire string".

The parser then has to do tens of thousands of "back off and retry" trial matches.

Use a different pattern.

lucy24

12:12 am on Sep 11, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



How would I rewrite missing...operating.html to missing-operating.html - I get a lot of problems when I try using quotes

Wrong:
Redirect permanent "/missing-...-operating. html" /missing-operating.html

Was this a typo in your original post? You're working with mod_rewrite, not mod_alias, right?

:: pause for g1 to insert boilerplate about Dreadful Things that can happen if you use both in the same .htaccess ::

Do you really have huge numbers of people typing in blahblah.html{somemoregarbage}? Enough that you need to make a rule instead of just dumping a 404 on them? If this stuff is coming from links, you'll want to fix them.

It has to go something like

(([^/.]+/)*[^/.]+\.html).+ http://www.example.com/$1

meaning "capture everything through .html" followed by any old non-captured garbage, and then rewrite using only the captured part. Here it's .+ rather than .* because if there's nothing after the html you are good to go and don't need to change anything.

But then you would have to list every special character like ":, ), (, %, &,"

No, because none of those characters are allowed to occur in an url anyway. They may be present in the query string, but the RewriteRule doesn't look at that. So you put the Pattern in terms of the characters that are not allowed in the part that the rule is looking at. Here they are / (because each directory is a package) and . (because it will only occur in .html at the end).