Forum Moderators: phranque

Message Too Old, No Replies

Another mod rewrite issue

Automatic scripts failing to understand rewritten url

         

XGCommander

8:15 am on Mar 28, 2010 (gmt 0)

10+ Year Member



[url]http://www.example.com/dev/article/final-fantasy-xiii-review-2[/url]

http://www.example.com/dev/article/final-fantasy-xiii-review-2

(rewritten URL)

[url]http://www.example.com/dev/article.php?p=4668[/url]

http://www.example.com/dev/article.php?p=4668

(actual URL)

Mod Rewrite Rules:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /dev/article/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /dev/article.php [L]
</IfModule>


I use the following code in article.php to figure out what article is being pulled:

$uri = str_replace("/", "", substr($_SERVER["REQUEST_URI"], 13));
$xg_article = getAllArticleInfoByName($uri);


And if it matters, pages are cached, but that shouldn't matter because both the actual and rewritten urls are cached (separately).

If you goto the actual url the share tools (digg, reddit, facebook, twitter etc) work fine.

If you use the rewritten url none of them work. For some reason they aren't able to connect to the page properly to grab info (or verify the page even exists) through the rewritten url.

I'm not really sure what is wrong, but it is kind of irritating that I can't figure out a solution. I'm not too familiar with mod_rewrite so I was hoping someone could help me fix it or direct me to some resources that might be able to help me out.

Thanx in advance,
Commander

[edited by: jdMorgan at 3:06 pm (utc) on Mar 28, 2010]
[edit reason] Use example.com only. Please see TOS. [/edit]

XGCommander

12:57 pm on Mar 28, 2010 (gmt 0)

10+ Year Member



To Expand on the problem a bit:

The problem is that the rewritten URL doesn't work outside of the website. Like if you goto the rewritten URL itself, it works fine, but if you try to submit the URL to something like DIGG, Facebook, or Reddit, their scripts can't find the url so the webpage can't be submitted.

If you go there, and type in http://www.example.com/dev/article/final-fantasy-xiii-review-2 as the article url, it gives an error:

This link does not appear to be a working link. Please check the URL and try again.

The only thing I can think of is that the mod_rewrite that I currently have setup is actually doing a redirect, not a rewrite. That is the only reason that I can think of that this would be happening.

[edited by: jdMorgan at 3:08 pm (utc) on Mar 28, 2010]
[edit reason] Use example.com only. Please see TOS and Charter. [/edit]

jdMorgan

3:37 pm on Mar 28, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your terminology is incorrect, which makes this problem hard to understand.

There is some ambiguity because of your terms "Link," "actual URL," and "rewritten URL."

The on-page link defines the URL which the client will request. (example.com/dev/article/final-fantasy-xiii-review-2) This is the "actual URL" -- out on the Web.

The mod_rewrite code rewrites that requested URL to a script filepath (<DocumentRoot>/dev/article.php?p=4668), as long as that client-requested URL does not directly resolve to a directory or filepath (prior to/without being rewritten), and as long as it consists of one or more characters following the leading slash of the localized-to-this-htaccess-file's-directory URL-path. This is a filepath, and not (or no longer) a URL.

This code clearly implements an internal rewrite, not a redirect. It is possible, however, that some other code or server function (e.g. mod_dir) may kick in subsequent to your rule's execution, and perform an external redirect in addition to this internal rewrite. If that is the case, then the result would be that the internal script filepath would be 'exposed' to clients as a URL, and qould show up in users' browser address bars and in search results. However, this is not the problem that you are reporting.

The only problem I see here is that the URL-path extracted by your PHP code uses a hard-coded number (13), and this number may need to be adjusted to suit your "dev" path if I surmise correctly that this "dev" path is only for testing, and is usually shorter on the 'real' site. I would suggest as an alternative that you use preg_match to 'scan' the Request_URI until you find the end of the expected directory-path, and then extract the remainder of the path as the 'title' of the article, rather than using a fixed character-count.

After getting the code working, you should consider adding a redirect from the filepath (old dynamic URLs) to the new static URLs if directly-requested by HTTP clients. However, I also strongly suggest that you do not attempt to do this until the main problem is resolved, as it will only complicate efforts to resolve that main problem.

In fact, you might consider breaking this problem in half, redirecting any request that matches a pattern very similar to the 'real one' to google.com in order to test the 'first part' of the mod_rewite function, which is matching the correct request_uri. For example, you could modify the code to temporarily redirect requests for the intentionally-misspelled URL-path "/dev/artical/final-fantasy-xiii-review-2" to google.com, just as a 'sanity check' to be sure that mod_rewrite is functioning, that this .htaccess file is located in the correct directory-path, and that the rewriterule pattern is correct for this location. In other words, use a 'divide and conquer' approach to debug one step at a time.

You are using RewriteBase, which is rarely required (and often mis-used). This further complicates discussion, as it leaves the location of 'this' .htaccess file's location in doubt.

Be sure to physically delete your browser cache before testing any changes to your server-side .htaccess or script code. Otherwise, your browser may not send any request to your server, but instead, show you stale cached content and server responses.

Jim