Forum Moderators: phranque
http://www.example.com/blog/2007/06/
That would be valid in my case. However, the calendar seems to have no limits to how far back in time it will go (though I am working on stopping it from generating bad URLs in the first place). So, any tool that follows links deeply enough (e.g., Xenu or Google's bot) will find URLs that don't really exist. Unfortunately, the CMS will happily display a page anyway, even for something like this:
http://www.example.com/blog/1925/01/
I started this blog one year ago (certainly not in 1925!), so any URL that indicates a date prior to June 2007 should lead to a 404 error. After fooling around with this for awhile, I finally go this to work:
RewriteRule ^blog/[1-2][0-9]{2}[0-6]/ /calculators/404/ [R=404,L]
At least it seems to redirect anything up to /blog/2006/whatever to my 404 page. It isn't perfect since it will still allow a few bad URLs (for early 2007).
Is this the best way to do this? Specifically, I want to know if there is a simpler regex that will catch /1900/ to /2006/ or anything that is before /2007/06/. Also, should I use a 410 instead of 404?
Thanks,
Tim
Instead, simply rewrite to a file in "/calculators" which does not exist, put the 404 error page in that same directory, and declare the ErrorDocument 404 to point to it.
As far as the regex goes, I'd tend to more-restrictive:
# If year 2007 jan-may
RewriteCond $1 ^2007/0[1-5]$ [OR]
# or if NOT year 2007-2029
RewriteCond $1 !^20(0[7-9]¦[12][0-9])
# Rewrite to a non-existent filepath to trigger a 404
RewriteRule ^blog/([0-9]{4}/[0-9]{2})/$ /calculators/nonexistent_path [L]
Jim
Tim