Forum Moderators: phranque
I'm in the process of moving a site to a new domain, and what I thought would be an easy switchover is turning out to be more complicated than I thought. The problem stems from the fact that the at olddomain.org is a blog running on Blosxom, and the way I've got it set up it generates permanent links to entries that look like [olddomain.org...] .
The new domain is running on entirely different setup (Drupal), so the new entries look like [newdomain.org...] .
I want to set up matching each file specifically to its counterpart since most of our inbound links use those URLs, but the problem comes in the matching. I can identify which 'YYYY/MM/DD#filename' goes with which 'node', but I'm having a hard time configuring .htaccess to match the URI's with? and # characters in them.
So far I've tried something like:
redirectmatch 301 /index\.shtml?YYYY/MM/DD\#filename newdomainlocation But that doesn't work (doesn't match). Should I instead try something with mod_rewrite? Can that still indicate a 301 redirect?
Thanks in advance.
zach
There's gotta be some way to test for this, though. Anything come to mind, or am I SOL?
EDIT: I think I can live with it. Since there aren't usually more than one entry per day, I can just match the dates and redirect to a "best guess" page.
As a followup, though, this approach I'm going to be taking will produce a (what seems to me) rather large .htaccess file. Is this going to be a problem or create a performance issue if I have, say, 2500 rewriterules?
You can also get closest to the actual transmitted URL by using RewriteCond %{THE_REQUEST}. This variable contains the entire request line sent by the client (browser) including the method and the protocol, such as:
GET /index.shtml?/YYYY/MM/DD#filename HTTP/1.1
(This is what usually appears in standard server access logs in the Request field)
Anyway, you ought to be able to catch that "#" character by experimenting with the info above. Because yes, 2500 lines is too many on a site getting traffic.
Jim
RewriteCond %{QUERY_STRING} filename65$
RewriteRule ^index\.shtml$ [newsite.org...] [R=301,L]
RewriteCond %{QUERY_STRING} filename65$
RewriteRule ^index\.shtml$ [newsite.org...] [R=301,L]
That would be nice, but no go. I hadn't actually thought to try that until just now, but again, I think the problem is that whatever comes after the # doesn't make into QUERY_STRING at all.
jdMorgan - I couldn't get it to match THE_REQUEST either, but I checked my logs and found that my request fields look like "GET /index.shtml?/2005/10/17/ HTTP/1.1" when I know the actual request included the filename after a # (I know because it's my IP address in the log).
Is that something that might vary by server? I don't have much control over this one (outside of .htaccess, obviously) so I probably wouldn't be able to change much in the logging department.
I am afraid to say that after seeing your log notice, and running some tests myself, that anything from the # sign is for temporary purposes.
The only use for the pound sign in a url that I have ever seen is for anchor linking within the same page.
It's probably carried by the browser from one page to the next, but most likely not available to the server.
Fragment identifiers have a special role in information retrieval systems as the primary
form of client-side indirect referencing, allowing an author to specifically identify aspects of an existing resource that are only indirectly provided by the resource owner.
Sorry, wish I could help you but this one is looking impossible.
Since the entries on oldsite.org are organized by date, I can use that to get a pretty good idea of where the entry is on the new site. Most of the time, in fact, there's only one post on any given day, so that's easy:
RewriteCond %{QUERY_STRING} ^/2005/05/31$
RewriteRule ^index\.shtml$ http://www.newsite.org/node/393 [R=301,L] On dates when there's more than one entry, I can determine that the indented file is one of a list, so I redirect to a page on the new site that expects that list as a parameter:
RewriteCond %{QUERY_STRING} ^/2005/11/15$
RewriteRule ^index\.shtml$ http://www.newsite.org/node/disambig?options=/644/642/660/643 [R=301,L] So that page, 'disambig', says "Sorry, couldn't find what you were looking for, but it might be one of these four: 644, 642, 660, 643. By the way, check out the new site!"
So it's a big workaround, but hopefully it'll only come up on a few dates.
Thanks for all the input.