Welcome to WebmasterWorld Guest from 3.80.4.76

Forum Moderators: Ocean10000 & phranque

Rewrite 410 not working

Old filenames return 404 when should be Gone

     
10:38 am on Sep 10, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3272
votes: 19


I have recently moved a site from ASP to Apache 2.4 on linux. I have expanded and modified the site considerably and file names no longer match, returning (naturally) 404.

I have tried (and failed) to catch the filenames and force a 410 on them but it does not seem to work. Filenames are of the general pattern index.htm, views.htm, maps-01.htm, views-t.htm, views-b01.htm. I have attempted to redirect these using the code below in htaccess:

RewriteRule "/(index|maps|views)-?[a-z]?(\d\d)?\.htm" "-" [G]

Any thoughts on this, please?
11:13 am on Sept 10, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3147
votes: 3


In .htaccess, the URL-path matched by the RewriteRule pattern never starts with a slash, so you need to remove the slash prefix on the regex.

For example:

RewriteRule "(index|maps|views)-?[a-z]?(\d\d)?\.htm" "-" [G]


You should probably have some start/end anchors on the regex. The surrounding quotes are optional here (they are only needed if the regex contains unescaped spaces).

If when the hyphen is included, it is always followed by something then I would probably make the whole of the last bit optional (eg. "-b01"), rather than making just the hyphen optional, which is arguably matching too much. (?)
6:38 pm on Sept 10, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3272
votes: 19


Thanks for the reply, penders.

Point taken about the hypen/etc as well.

I have modified it to...

RewriteRule ^(index|maps|views)(-[a-z]?(\d\d))?\.htm "-" [G]

I will see what happens next. :)
9:29 pm on Sept 10, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3147
votes: 3



RewriteRule ^(index|maps|views)(-[a-z]?(\d\d))?\.htm "-" [G]


You'll still need to make the digits (\d\d) optional in order to match "views-t.htm".

For example:


RewriteRule ^(index|maps|views)(-[a-z]?(\d\d)?)?\.htm "-" [G]
11:35 pm on Sept 10, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11825
votes: 238


the double quotes are optional and unnecessary in this case.

i would also make the regular expression more specific with an end anchor:


RewriteRule ^(index|maps|views)(-[a-z]?(\d\d)?)?\.htm$ - [G]
10:22 am on Sept 11, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3272
votes: 19


penders - thanks, well spotted! :)

phranque - Double quotes now removed. I deliberately avoided the end anchor as I've noticed some bad bots append querystrings to the end. Or would they be not included in the rule anyway? I see nothing to indicate that.
11:05 am on Sept 11, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member penders is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2006
posts: 3147
votes: 3


...some bad bots append querystrings to the end. Or would they be not included in the rule anyway? I see nothing to indicate that.


Query strings are "not included in the rule anyway". The RewriteRule directive matches against the URL-path only, which notably excludes the query string, so the rule will match "any" query string by default.

Aside: In order to match a query string you would need a RewriteCond directive and match against the QUERY_STRING server variable.
12:18 pm on Sept 11, 2019 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11825
votes: 238


phranque - Double quotes now removed. I deliberately avoided the end anchor as I've noticed some bad bots append querystrings to the end. Or would they be not included in the rule anyway?

as penders mentioned, the query string isn't matched to the rewriterule pattern, only the url path is.

... The Pattern will initially be matched against the part of the URL after the hostname and port, and before the query string (e.g. "/app1/index.html").

source: https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewriterule
9:42 am on Sept 12, 2019 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3272
votes: 19


Many thanks, both. End anchor now added.

And thanks for solving my problem. Log says I am now pushing out 410 for the relevant pages. :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members