Forum Moderators: open
I have been using 301 redirects to cautiously rename 4 or 5 pages at a time as to not upset steadily growing traffic and PR.
Accordingly, my .htaccess is getting pretty large. Some of the more obscure redirects to pages that I feel would not be linked to, I dropped after 1 Google update. What would be a reasonable time period to keep the other redirects up?
A lot of this depends on your specific site, and how people get to it - links, bookmarks, type-ins, search engines, and the ratios and numbers of those. As you noted, some pages have no backlinks external to the site, and others do. Use AV and FAST to find all the backlinks. Look at your logs for requests for the relocated files. Then figure out how many requests are coming in (or might come in) for those relocated pages, and pick a number - How many 404s you can afford depends on your site.
So once the new URLs appear in all the search engines that are important to you, it's no longer a question of "how long to wait" but rather of "how many requests to 404."
HTH,
Jim
After that, I'd double check just how much traffic is coming in on it. Like JD said, if you can afford the 404, then make your 404 page informational where people can find what they came for. A good site search box on the 404 page never hurt either.
Yes, a very large .htaccess can delay the serving of files.
However, unless the site is bogged down by huge scripts, the disk access speed is probably the most significant time-waster - up to a point. Since .htaccess will be read and processed for every html document, image, and script (i.e. for every request), it is likely to remain in the server's cache, which helps. So, up to a point, the time required to get files off disk swamps the processing time for .htaccess.
One thing you can do is to put your "high-runner" RewriteRules and other often-used directives first, and use the [L] flag on RewriteRules wherever you can. Rules which redirect common image requests - requests for spacer gifs or your logo, for example - should go first, and error handlers can probably go last - unless you get a lot of errors, that is. :o
The order is easy to determine using your log file analyzer, so you can put them in that order unless it interferes with other dependencies in your logical processing flow.
Another thing to do is to take care to use RewriteRule's pattern-matching capability to its fullest to avoid processing RewriteConds for every rule; RewriteConds are only evaluated if the RewriteRule matches, so try to make the RewriteRule as selective as you can. This behaviour is non-intuitive - see the Apache mod_rewrite documentation for a more thorough discussion.
Following on that, try to make each directive handle as many redirects or access restrictions as it can, to cut down on the total number of directives required to do the job.
HTH,
Jim
But even so (and the reason I italicised 'expense') this is not generally considered a particularly expensive process (when compared with your average bloaty web page download time).
If it was me, I'd leave the 301 up there for as long as possible.
<side>Is HTTP/1.1 keep-alive used for redirects does anybody know?</side>
*though it goes without saying you should try and squeeze every ounce of performance out of every area you can;)
Use this to see if the pages have been requested (though, i am not familiar with using 301's, I usually just do a redirect page for any 2nd level pages, or important pages).
Also, I assume you would have already done a complete search over the site for anything linking to those pages (either through a text searching tool such as Search and Replace, which is an invaluable tool for this sort of thing), or used something like site server, or another web site crawler.