Forum Moderators: phranque

Message Too Old, No Replies

410 pages with certain parameters

         

joergnw10

9:28 am on Jan 7, 2019 (gmt 0)

10+ Year Member



Hi,
I have a number of pages for which I would like to return a 410. The url's all include either one or more of the following parameters:
cur_page
price-min
beds-min
baths-min
price-max

Here an example url:
http://www.example.com/index.php?cur_page=0&action=searchresults&price-min=000&price-max=200000&beds-min=1

From looking through posts on the forum I have so far come up with the following:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} url=(cur_page|price-min|beds-min|baths-min|price-max)
RewriteRule ^$ - [G]

Unfortunately it does not work. Any idea what might be wrong or if I am even on the right path with this? Thanks for your help!

whitespace

10:54 am on Jan 7, 2019 (gmt 0)

10+ Year Member Top Contributors Of The Month



RewriteCond %{QUERY_STRING} url=(cur_page|price-min|beds-min|baths-min|price-max)


What is "url=" intended to match? (This does not appear in the example URL/query string you posted)

RewriteRule ^$ - [G]


(Assuming you are using .htaccess). The regex ^$ matches an empty URL-path, whereas your example URL contains "index.php".

If you have other directives in your config file then these could also be a factor, as the order could be important.

<IfModule mod_rewrite.c>


Aside: The IfModule wrapper is not required here.

joergnw10

3:44 pm on Jan 7, 2019 (gmt 0)

10+ Year Member



Thank you for your reply!
I must have misunderstood the thread where I got the code from, I thought it would be an easy way to 410 everything with just one line.
I have now added separate directions like the following:
RewriteCond %{QUERY_STRING} \bcur_page\b [NC]
RewriteRule ^ - [G]

This seems to work ok.

lucy24

7:01 pm on Jan 7, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought it would be an easy way to 410 everything with just one line
Yes, you can do that. The question was just where the heck the "url" part came from.
RewriteCond %{QUERY_STRING} (cur_page|price-min|beds-min|baths-min|price-max)
without anchors should work just fine.

But wait! Are there other parameters containing "min" or "max" that you need to keep? If not, all you'd need is
RewriteCond %{QUERY_STRING} \b(cur_page|min|max)\b
taking advantage of the fact that - (hyphen) is a non-word character.

phranque

12:30 am on Jan 8, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteRule ^ - [G]


this rule will fire on every request (including images, for example), requiring the evaluation of the conditional.
if you specify a more restrictive pattern you can make things more efficient.

i would go with something like this:
RewriteCond %{QUERY_STRING} \b(cur_page|price-min|beds-min|baths-min|price-max)=
RewriteRule ^index\.php$ - [G]

joergnw10

8:50 am on Jan 8, 2019 (gmt 0)

10+ Year Member



Thanks for all your replies and information!

Sorry about the confusion regarding the 'url=' in the code I used. I had another look at the thread I had it from and it was actually part of the url of the example that was used - I had thought it was a placeholder or expression.....

The suggestions work very well, thank you! I like
RewriteCond %{QUERY_STRING} \b(cur_page|min|max)\b
but I am not 100% certain that there might not be some links at the backend of the cms with the 'min' or 'max' parameters. I decided to go with the more restrictive rule from the last post and all seems to work great.
Thanks again!

joergnw10

10:53 am on Jan 9, 2019 (gmt 0)

10+ Year Member



Hi again,
Unfortunately I have come across another issue I need to fix with a 410 response:
There are loads of pages with two constant terms followed by a random number, like
[url]http://www.example.com/constant1/constant2/2345.html[/url]
I now put int o the htaccess file:
RewriteRule constant1/constant2 - [G]

It seems to work ok, but I am wondering if this is safe / sufficient or if I need to add any extra characters?
Thanks!

phranque

11:39 am on Jan 9, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteRule constant1/constant2 - [G]

It seems to work ok, but I am wondering if this is safe / sufficient or if I need to add any extra characters?

it's generally most efficient to make the pattern as restrictive as possible:
RewriteRule ^constant1/constant2/[0-9]+\.html$ - [G]

[edited by: phranque at 10:11 pm (utc) on Jan 9, 2019]

joergnw10

3:03 pm on Jan 9, 2019 (gmt 0)

10+ Year Member



Thank you, but unfortunately it does not seem to work.
It returns a '200' response (the urls are all links to pages with an individual photo. For some reason the cms never returns a '404' whether the url / picture ever existed or not).
Could that be why?

lucy24

9:23 pm on Jan 9, 2019 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're using a CMS, make sure all your own rules are located before the CMS segment, which typically involves sending all requests for files-that-don't-physically-exist to index.php. You are right that server logs will always show a 200 response, because all it means is that the server has successfully rewritten to a file that does exist, namely index.php or whatever it may be. If it's a properly coded CMS, it will then send out a 404 response; you just won't see it in logs. (It took me a long time to wrap my brain around the fact that the response the server records is not necessarily the response the user receives.)

If the part you quote is the beginning of the URLpath, then put in an opening anchor:
RewriteRule ^constant1/constant2/ - [G]
If nothing fitting this pattern is still in use, you don't need a closing anchor or a longer pattern, because the server already has all the information it needs.

phranque

1:05 am on Jan 10, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



but unfortunately it does not seem to work.

i see that i had a typo which i have edited to correct.

if lucy24's solution is sufficient, go with that:
RewriteRule ^constant1/constant2/ - [G]

if that isn't specific enough then use my version:
RewriteRule ^constant1/constant2/[0-9]+\.html$ - [G]

joergnw10

8:38 am on Jan 10, 2019 (gmt 0)

10+ Year Member



Thank you both very much for your help!

Regarding the cms - as Google has loads of non existent pages either indexed / as soft 404's or at least listed as 'not indexed', they must be returning a 200 response.
I do have the cms code below my other rules, and I guess it might be the following rule from the cms that deals with the rewrite of pages that do not exist:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L]

But maybe best to leave it alone, I think thanks to your help most if not all non existent pages should now be 'gone'!

phranque

9:05 am on Jan 10, 2019 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.php [L]

this is a typical catchall rewrite ruleset used by a cms.
these directives check to see if the requested filename is a file or directory and if not, all requests (other than "GET /") are internally rewritten to index.php

index.php must check that any url path rewritten to it is legitimately a canonical url.
otherwise the response should be a 301 or a 404/410.