Welcome to WebmasterWorld Guest from 54.159.246.164

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

410 Gone for dynamic URLs

I need to return 410 Gone for URLs that shouldn't have been crawled

   
11:10 am on Nov 11, 2013 (gmt 0)



Hi,

I need to return 410 Gone for a bunch of URLs that shouldn't have been crawled in the first place and are now 404ing. We've fixed the error that caused them initially, but there are around 3000 that have been indexed. These are all dynamic and follow patterns such as:

http://www.example.com/product-tag/nameoftag/page/2/?filter_region=92
http://www.example.com/product-category/nameofcategory/?filter_product_cat=22,269

I would like to use .htaccess to return 410 Gone for these (as suggested by a kind person on the SEO forum) but I'm unsure of how to implement regex to catch all of these. Any help would be massively appreciated.

Thanks,

Ria
12:58 pm on Nov 11, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You'll need more than two illustrations to make a RegEx pattern. I don't see any unifying theme, except for the parts you've obfuscated:
/product-
/nameof
?filter_
I don't suppose any of those are part of the real URLs.

If you can explain in English what the pattern is, we'll see about hammering out a RegEx.

Variables:
name of requested page(s)
name of parameter(s)
value or value range of parameter(s)
1:07 pm on Nov 11, 2013 (gmt 0)



Hi Lucy,

The two examples were representative of a bunch of similar ones - so on example one above, where the region filtered was 92, it could have been 58, or 24, or anything. Similarly with the category example, the ID of the category could be anything numerical.

Essentially, I'd like any filter parameters to be removed from the index. These are primarily tags and categories - the tags could be regions, colours, varieties etc. The range of each parameter will be from 0 to no more than 500.

In the examples, I obfuscated the domain and the /nameoftag/ and /nameofcategory/ - the other variables were as they are in the actual URLs. Name of tag could be, for example, chardonnay, and the category could be organic.

Thanks for your help!
9:08 pm on Nov 11, 2013 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, riatkstarley!


i would use a RewriteCond to catch all the QUERY_STRING values that start with or contain a 'filter_' variable and follow that with a RewriteRule using the G flag.
it's possible that might be too simple and would catch too much.

[edit]missing "or" in "start with or contain"[/edit]

[edited by: phranque at 4:08 am (utc) on Nov 12, 2013]

3:59 am on Nov 12, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



How many different paths can carry the "filter_" parameter? Since the %{QUERY_STRING} part requires a Condition, you want to constrain the search as tightly as possible so the condition doesn't have to be evaluated on every single request. Both of your examples involve directory-index pages (either physical directory or URL made to look that way, doesn't matter). So at a minimum:

RewriteRule /$ et cetera


so you only evaluate the RewriteCond if the request was for a directory.

Do you want to discard all URLs that contain the "filter_blahblah" parameter, or do you want to discard the parameter and keep the rest of the query? If the latter, does "filter_" always come at the beginning of the query string? Can it be followed by other stuff? If yes to both, do any other parameter names begin in f?

Best case involves

^filter_[a-z]+=[\d,]+&(more-stuff-here)


where more-stuff-here becomes %1 in a redirect. Worst case involves

(.*?|^)filter_[a-z]+=[\d,]+($|&more-stuff-here)


with %1 and %2 in the redirect.