Forum Moderators: phranque
Large sites, e.g. national TV station's, newspapers etc constantly evolve over time; naturally new content is going to come along, microsites will be created then discarded, old content will die away, etc. Surely then over time, to preserve as much relevance as possible, to ensure as much PageRank and link popularity is preserved, the number of redirects on a large site has the potential to be huge? Surely httpd.conf and htaccess files are potentially enormous on such sites?
But this must bump into a seo-preserves-ranking-via-301's versus scalability conflict given that large chunks of redirect code are run for every request?
So, don't large sites just end up with very large httpd.conf and .htaccess files? And can they deal with any resulting scalability issues by just slapping some more servers in the web farm?
Thanks for any thoughts,
Ste
At a lower level, there are several ways to do "massive numbers" of redirects, and the use of hard-coded "Redirect" directives and RewriteRules is only efficient at the small scale. A more efficient approach is to use RewriteMap to access a hashed URL lookup table, or to call a script that uses hashes as keys to lookup old-to-new URL translations. In this way, you avoid having to process the Redirect or RewriteRule directives in a linear fashion -- The hashes let you grab the URL you need much more quickly, even if a second or third-level hash is required. And advantage of the script method is that old-to-new URL translations can be stored in the same database as used for all other page-related data, centralizing administration and maintenance.
At an intermediate level, you should know that code in httpd.conf is pre-compiled into executable code at server restart, and executes as "native" code. Therefore, it is *much* more efficient than even identical code in .htaccess, which is interpreted from text to executable code on-the-fly for each and every HTTP request.
One problem is that httpd.conf and RewriteMap are inaccessible to users of shared name-based virtual hosting. But since shared hosting isn't particularly scalable either, those who have the 'too many redirects' problem are also likely to have outgrown shared hosting as well.
The major factor that really helps avoid running up against this problem is to have a good site and URL architecture based on content-type, SE robot, caching, access-control, and maintenance considerations.
Sir Tim Berners-Lee's paper, "Cool URIs don't change [w3.org]" pretty much covered it all. As the inventor of the WWW, he's worth listening to.
Jim
It's not been unusual for me to modify 200-300 links because they fail to include a redirect (simple procedure) in their work schedule. SLOPPY WORK on their end.
BTW, for the longest time I failed to keep record of these old links, however today the realization has finally slapped me in the face to save the old URL's and utilize archive.org for the quality items that had been removed. Too bad I didn't relaize this some ten years ago.
Don