Forum Moderators: phranque
RewriteRule ^x([0-9a-z]+)$ /index.php?p=$1 [NC,QSA] So, as an example, if there’s an incoming URL request of:
xyz.com/x24 It should route the browser to the post that has that ID number. When this URL is manually entered into the browser, it works fine. The issue is that this particular line does not rewrite the URL.
I’m using this in the .htaccess file in the root directory of a single-user WordPress install. This line, and only this line, fails to work. It does not show any errors in the apache_error_log file. When a URL of that pattern is encountered, it throws a 404 error, displaying my custom 404.php page.
The reason I have a [0-9a-z] pattern is that I will eventually be encoding post IDs into Base 32 equivalents and then redirecting to a script that queries the DB to pull the proper post ID and rewrite the URL. But, I cannot even get this single line to work with a real post ID number. It just throws a 404.
I’m experiencing the same results on my dev site as I am on my live site. My test site is running Apache 2.0.63. My live site is a dedicated server running Apache with some version of 2.x.x as well.
Here’s the entire mod_rewrite section in the .htaccess file:
Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteRule ^x([0-9a-z]+)$ /index.php?p=$1 [NC,QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Am I doing something boneheaded that I just can’t see?
The requests for index.php file itself should be excluded from the second rule using a negative-match RewriteCond for the same reason.
I suggest that you do not use the [NC] flag, as this invites duplicate-content issues due to various uppercase/lowercase URLs all resolving to the same content. If you do need to use the [NC] flag, the the script itself should 301-redirect all mis-cased URL requests to the properly-cased URLs.
However, neither of theses problems seems 'fatal' -- Did you completely flush (delete) your browser cache every time you uploaded new code and before testing it?
Jim
Thanks for the reply. Clearing the browser's cache makes no difference. It simply does not rewrite the URL. I've even cleared cache via the backend in my IDE.
This would make more sense if it happened only on one server, but it happens on both servers. One I configured, the other is a dedicated DH server.
I'm not totally clear on that statement, so to be specific: If this .htaccess code is in a Wordpress subdirectory, then the requested URL will need to be example.com/path-to-wordpress-directory/x24 Otherwise the .htaccess code in the Wordpress directory will never get executed.
Also, disable MultiViews (using the Options directive) if you're not using content-negotiation, and disable AcceptPathInfo on Apache 2.x if you're not using that either.
If this trivial rule won't match, it's either because it's in the wrong location or something else is interfering with it. So that's why I recommended the above steps.
Jim
Thanks again for your comments and suggestions. I guess my statement about my root environment was not clear.
The .htaccess file in question is installed in the site's root directory. That is also where WordPress is installed. I do not have a subdirectory install of WP. So, the site's root is the blog's root as well.
I disabled MultiViews and AcceptpathInfo but the same behavior remains--the RewriteRule is not fired.
What a pain. I've been massaging this single line of code for 3 days now. I guess my next step is to reinstall Apache. Something must be fouled up somewhere that is not obvious.
RewriteRule ^x([0-9a-z]+)$ /index\.php?p=$1 [NC,QSA,L]
RewriteRule ^x([0-9]+)$ /index\.php?p=$1 [NC,QSA,L]
RewriteRule ^x123$ /index\.php?p=123 [NC,QSA,L]
Also, if some other agent is interfering, the most likely problem it might cause is to add a trailing slash. So you can also try permutations such as:
RewriteRule ^x123/$ /index\.php?p=123 [NC,QSA,L]
RewriteRule ^x123/?$ /index\.php?p=123 [NC,QSA,L]
Also, if there are any other details you've left out -- for example that perhaps these "x123"-type URLs used to be proxied or rewritten to some other path, or that the URLs aren't really as simple as "/x123", or that you got other redirects enabled in some "control panel" or something that might affect requests for these URLs, then those missing details might have something to do with the problem.
What else... Oh, make sure you're using a plain-text (ASCII/UTF-8) editor.
This is decidedly odd. Therefore, the problem is likely to be some simple 'gotcha'. I really doubt you'll need to do a reinstall, though, and would advise trying everything else first...
Jim
Thanks for all the additional ideas. I tried them all and nothing worked. But, you gave me an idea. Instead of using the relative path, what would happen if I used the full URL path?
So, instead of using
RewriteRule ^x([0-9a-z]+)$ /index.php?p=$1 [QSA,L] what would happen if I tried this
RewriteRule ^x([0-9a-z]+)$ http://xyz.test/index.php?p=$1 [QSA,L] Using the full path to the resource worked! The ReWriteRule fires and actually rewrites the URL. I could have sworn that I had tried that before, but I obviously lost track over the past several days.
You asked, "if there are any other details [I've] left out." Well, come to think of it, and this is probably what has been causing me such great consternation, I have VirtualHost containers setup in my Apache http.conf file as I'm developing and testing multiple hostname-based sites. Furthermore, my dedicated server also has VirtualHosts setup.
It still seems that the original ReWriteRule directive should have worked. The VirtualHost containers appear to be properly configured and I have had no issue with them before. When using VirtualHosts, do you have to reference the full URL path? This seems unlikely as that would mean all the domains on shared hosting would be required to use full instead of relative URLs.
Again, thanks for all your advice, ideas, and encouragement. Please let me know if for some reason using the full path is not a great idea--or if there is another way to tackle this issue.
Is there something else in the httpd.conf that's making this rewrite require the use of the absolute path?
On my development server, ServerName is set to localhost:80. That makes sense. On an webhost's server, that would be set to whatever domain or ip address they used to uniquely identify that server.
Again, on my development server, my VirtualHost containers, along with my DNS server, successfully route incoming requests to the proper domain. I'm trying to figure out the commonalities between my development server and my dedicated production server that would cause this rewrite rule to require absolute paths.
Unfortunately, I cannot inspect my production server's Apache httpd.conf file as it is with DH and they do not allow root access. But there must be some default Apache configuration that is causing this to happen. It would be too much of a coincidence that I just happened to set up my development server with the same special configuration (or rather misconfiguration) that DH used.
You now have an external 302 redirect. That's something completely different, and very dangerous.
and actually rewrites the URL
These rules do not 'make' new URLs. I am thinking you have this exactly backwards. URLs are defined in links. Those are the URLs that users see and use. A rewrite then connects that URL request to some internal server resource.
Thank you for your comments. I do understand internal versus external server resources. If you are interested, read on as what I am actually interested in is providing a URN to a server resource, not a URL.
Jim-
Okay, the answers to your three questions. For purposes of this discussion, assume my domain is this:
xyz.test 1. URL typed to test code: xyz.test/x23
2. Filepath URL should resolve to: xyz.test/?p=23
DocumentRoot setting for this "domain": /Users/system_username/Sites/xyz
The “system_username” directory in the filepath above is obviously not the real username used in my setup.
To better understand the last answer, this a development environment on a local Apple Mac network. All the test domains have an appropriate host record. The httpd.conf file has VirtualHost containers setup and I have a DNS server running on one of the machines. The dev environment works fine. There have be no issues with it in the past. I am not serving content outside of this network.
Now, what I am trying to do is being done by a number of other WordPress users. It works for them without issue. But, just in case you are not familiar with WordPress’ inner workings, let me give you a brief description of how WordPress handles blog post URIs URLs).
Below, I’ll post a detailed description of what I’m trying to accomplish.
What I am ultimately after is coding my own, custom URL shortener. This one RewriteRule that we’ve be discussing in this thread is the first step toward that goal. I know that the RewriteRule I originally posted works for others, but it does not work on my dev site or dedicated server.
Coding a custom URL shortener is not complex. In fact, I have all the code finished. It’s working fine.
WordPress URLs
Some brief background in case you’re not familiar with WordPress.
In WordPress, posts are primarily identified in two different ways: via their unique Post ID number (which is created by an auto-incrementing primary key field in the wp_post table of the MySQL DB), or via a permalink. The Site Admin can choose from a number of different permalink structures. So for this discussion, assume the permalink structure has been set up like this
.../year/month/day/post_title/ Next, let’s assume that the Post ID for the test post in question is “23”. So, typing this URL in a browser
xyz.test/2009/12/20/this_is_a_test_post/ will take a user to the same resource as typing in this one
xyz.test/?p=23 As I said, these are the two primary ways to direct a user to a given resource on a WP blog. But, there are others. For example, this will work as well
xyz.test/?page_id=23 This demonstrates the infamous duplicate content issue over which some WordPress users fret. Depending on how a given WP install is configured, a single blog post may have 5 to 7 different URLs—sometimes even more—with which the same resource can be identified.
Users that are worried about this can use a robots.txt file, special plugins, or the new Google-proposed rel=”canonical” link tag in the <head> section of their header file to help guide search engines to their preferred URL.
Now, on to the issue at hand.
Short URLs Using Post ID
Using a given post’s ID number is one way people create a quick, short URL in WordPress. This works for a number of people without issue.
xyz.test/x23 The “23” is the unique WP post ID number. So, when a user clicks that link, it should be rewritten to
xyz.test/?p=23 Which is an actual URL to an actual internal resource. WordPress will then redirect this to its permalink URL
xyz.test/2009/12/20/this_is_a_test_post/ Short URLs and URNs
But, this exercise was supposed to be just a simple test for me. This is not what I’m ultimately after. The above URL shortening method is flawed for several reasons which will take this thread too far off topic to mention here.
Instead, what I plan to do is to assign a unique Base 32 code to each post, associating it in a special DB table with its permalink URL. Thus, as an example, the shortened URL that is provided to the outside world
xyz.test/x2z is actually a URN. The “2z” is a Base 32 representation of the unique ID number in the special table mentioned above. When a user clicks on this link, here’s the RewriteRule that should pass the URN to the php code
RewriteRule ^x([0-9a-z]+)$ shorten.php?code=$1 [QSA,L] This will then allow the functions within shorten.php to grab the Base 32 encoded number and then query the special table to find the actual resource URL (permalink). It will then redirect the browser (301 redirect) to the actual resource.
Again, all the backend stuff I've finished coding. It works fine. Perhaps I am being foolish and should just cut right to the chase. But I'm concerned that since this one RewriteRule that we've been discussing is not working that it is a symptom of a possible server misconfiguration.
This rule works for others but not for me.
In order for your rule to have an effect, its pattern must be correct o match the localized URL-path seen by the rule, the code must be in a directory that will be traversed during the URL-to-filepath translation of the requested URL-path, and that traversal must actually take place -- that is, not be by-passed because some other module or directive defeats it by running first and either invoking the content-handler (ending all module processing) or modifying the request-path so that it no longer traverses this code's .htaccess file's directory or matches your pattern.
As to your overall design, it should not be necessary to use a custom database and additional bas32 identifiers if each post already has an ID number. All you need to do is to use that ID number to query the main WP database using a script. This can be done by rewriting all variant-URL-path requests to your script, invoking the content-handler (which means your script will have to generate the entire server response including HTTP headers -- 301 redirects, for example) or using mod_rewrite's RewriteMap function to give mod_rewrite the capability to query your database. The former solution requires a more-complex content-handler script, perhaps implemented as a 'wrapper' around WP's index.php script, while the latter requires that any Webmaster who wishes to use your solution have access to the server configuration (this is required to *define* RewriteMaps, but not to use them).
Two final comments: To avoid furhter duplicate content, it will be necessary to check the 'post title' in all requested URLs which include it to make sure that it agrees with the IS number also given; If not, then the title is bogus, and you'll need to use the ID number to retrieve the proper post title and redirect to the correctly-titled URL (if you still intend to support any of those title-in-URL-format URLs).
Also, I should point out that many Webmasters would prefer to have the title in the URL for SEO purposes. So despite the previous comment, we'll proceed based on the stated goal of creating a URL-shortener, which may not be good from an SEO viewpoint.
Jim
As always, thanks for your comments. This morning I started to investigate WordPress' Canonical API (found in /wp-includes/canonical.php) to see if it might be doing exactly as you've stated above.
I'm not sure why this works for others. Perhaps this mentioned API has changed in the newest version of WP and this RewriteRule no longer works at all for anyone.
By the way, I deactivated the very few plugins I use to see if one of them might be causing the issue. That had no effect. Also, the .htaccess file that we've been discussing here is the highest-level .htaccess file in this site. So, it is either something in my server's config file or in the WP codebase.
The reason for using a Base 32 encode and not directly referencing the unique ID number, is two fold: first, I can get smaller URLs a key in Twiiter, second, if for some reason I have to rebuild my database from a backup, WordPress generates new ID numbers for all posts and pages. This is because it uses an auto-incremented field type. So, all those previous links would be dead.
As to your point about branding, you are correct about using post titles versus IDs or something else. But, in Twitter, using the post's title takes up too much space. Of course, the reason to use your own custom URL shortener is for branding as well. Third-party URL shortening services are a terrible disservice to your brand.
bit.ly/? is not a branded URL (I used “?” just as an example so someone would not think it is a real bit.ly link) whereas
xyz.test/x23 preserves some semblance of your brand identity.
One additional spot to look is in your 'control panel', if you are blessed with a host that requires one... These cp scripts often write code into the config files and/or the top-level .htaccess file. Being "one-size-fits-all" code, it's generally pretty awful, usually very inefficiently-coded, and often uses patterns which are quite ambiguous. This latter factor occasionally results in unexpected rewriting.
There's got to be something there because there's no other reason why a rule using such a trivial pattern wouldn't get invoked.
Jim