Forum Moderators: phranque

Message Too Old, No Replies

301's versus scalability

         

stevej444

4:13 pm on Jun 14, 2007 (gmt 0)

10+ Year Member



Hi all,
I have a question about 301'ing and scalability.

Large sites, e.g. national TV station's, newspapers etc constantly evolve over time; naturally new content is going to come along, microsites will be created then discarded, old content will die away, etc. Surely then over time, to preserve as much relevance as possible, to ensure as much PageRank and link popularity is preserved, the number of redirects on a large site has the potential to be huge? Surely httpd.conf and htaccess files are potentially enormous on such sites?

But this must bump into a seo-preserves-ranking-via-301's versus scalability conflict given that large chunks of redirect code are run for every request?

So, don't large sites just end up with very large httpd.conf and .htaccess files? And can they deal with any resulting scalability issues by just slapping some more servers in the web farm?

Thanks for any thoughts,
Ste

jdMorgan

6:26 pm on Jun 14, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



At the high level, many large sites simply "don't play along" with the idea that the Web should be a library and that pages (information) should have permanence. Actually, many smaller sites don't either; These forums are filled with "I totally redesigned my site and..." threads. So, many sites avoid the problem completely by simply allowing their content to disappear -- something that should give pause to those who consider future historians in a "totally-online world" where the Web has replaced books, magazines, newspapers, film, and almost all other "archival" media...

At a lower level, there are several ways to do "massive numbers" of redirects, and the use of hard-coded "Redirect" directives and RewriteRules is only efficient at the small scale. A more efficient approach is to use RewriteMap to access a hashed URL lookup table, or to call a script that uses hashes as keys to lookup old-to-new URL translations. In this way, you avoid having to process the Redirect or RewriteRule directives in a linear fashion -- The hashes let you grab the URL you need much more quickly, even if a second or third-level hash is required. And advantage of the script method is that old-to-new URL translations can be stored in the same database as used for all other page-related data, centralizing administration and maintenance.

At an intermediate level, you should know that code in httpd.conf is pre-compiled into executable code at server restart, and executes as "native" code. Therefore, it is *much* more efficient than even identical code in .htaccess, which is interpreted from text to executable code on-the-fly for each and every HTTP request.

One problem is that httpd.conf and RewriteMap are inaccessible to users of shared name-based virtual hosting. But since shared hosting isn't particularly scalable either, those who have the 'too many redirects' problem are also likely to have outgrown shared hosting as well.

The major factor that really helps avoid running up against this problem is to have a good site and URL architecture based on content-type, SE robot, caching, access-control, and maintenance considerations.

Sir Tim Berners-Lee's paper, "Cool URIs don't change [w3.org]" pretty much covered it all. As the inventor of the WWW, he's worth listening to.

Jim

stevej444

9:38 am on Jun 15, 2007 (gmt 0)

10+ Year Member



Hi Jim,

Great post, informative and very helpful as ever.
Many thanks,

S

wilderness

3:57 pm on Jun 15, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim,
The widegts of my pages have outbound links to another widget site which leads the widget industry in source.

Although their website is database managed, very frequently pages are moved and renamed.
In the process, the old pages and their traffic just falls by the wayside without redirects.

It's not been unusual for me to modify 200-300 links because they fail to include a redirect (simple procedure) in their work schedule. SLOPPY WORK on their end.

BTW, for the longest time I failed to keep record of these old links, however today the realization has finally slapped me in the face to save the old URL's and utilize archive.org for the quality items that had been removed. Too bad I didn't relaize this some ten years ago.

Don