Forum Moderators: phranque
http://www.example.com/listing-2710-mountain-road-pasadena-maryland-mls-aa7172242-12921.html
I'm trying to create SE friendly URLs for custom links which currently might look like this:
http://www.example.com/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=ANNE+ARUNDEL
&city=ANNAPOLIS&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=Detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC
I'd like to rewrite this to look something like:
http://www.example.com/maryland/anne-arundel-county/annapolis/home-for-sale
-675000-1000000-detached-waterfront-cur_page1.html
So the word maryland is added as a directory, the word "county" is moved and that becomes a directory, the city value is a directory, only the variable name is used for "ForSale" and "waterfront" and the rest only write the values. Then drop the "action=....." and replace with "cur_page1.html"
Is this possible?
[edited by: jdMorgan at 6:04 pm (utc) on Dec. 5, 2009]
[edit reason] example.com & side-scroll [/edit]
The two main functions of mod_rewrite are to 'connect' an incoming client HTTP request to a non-default file on your server, and to redirect client requests for one URL to another URL. The code that came with your "software" is very likely implementing the first of these functions.
The main point is that mod_rewrite works as a request is received from a client (e.g. browser or search engine robot), internally rewriting the request or externally redirecting the client before any content is served or any scripts are invoked. It is a "server input process" and not any kind of "output-page content modifier."
So that means that in order to "change the URL," you must edit your static pages (if any) and modify either your database, your script, or both, so that the links that appear on your pages are in the form that you want human users and search engines to "see" -- The appearance of a URL in a link on an HTML page *defines* that URL.
The next step is to "re-connect" those new pretty URLs with the script on your server that will actually produce the requested page.
The entire process is described in this thread [webmasterworld.com] in our Apache Forum Library [webmasterworld.com].
However, before proceeding, you need to be aware that mod_rewrite is really inefficient at doing case-conversion (I should say "devastatingly and practically unusably inefficient" in order to make this point strongly enough). It's also impossible or practically impossible to do anything but convert the casing of the URL to anything but all-lowercase or all-uppercase.
So before investing too much time in this project, try testing your script by directly typing in a dynamic query with the exact same character case as that which you wish to use in the parameter-values of your friendly/pretty URLs. Failing that, try typing in the query in all-lower- or all-uppercase. If it works, you may be able to do this using mod_rewrite. If not, then you're going to need to buy an off-the-shelf "SEF" plugin for your software, or to code a much more complex scripted solution.
Jim
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^pclass%5B%5D/([^/]+)/([^/]+)/([^/]+)/([^/]+) /index.php?pclass%5B%5D=$1&County=$2&city$3&action [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?pclass%5B%5D=([^&]+)&County=([^&]+)&city([^&]+)&action([^&]+)\ HTTP/
RewriteRule ^index\.php$ [marylandhomespro.com...] [R=301,L]
[mysite.com...]
Any suggestions as to where I'm going wrong?
[edited by: LionMedia at 8:52 pm (utc) on Dec. 5, 2009]
[mysite.com...]
On your pages, you must link to the URL http://www.example.com/maryland/anne-arundel-county
/annapolis/home-for-sale-675000-1000000-detached-waterfront-cur_page1.html
This URL is what a client -- a user or search engine will see, and this is the URL that that that client will request from your server.
Leaving aside the character-casing problem momentarily, in .htaccess, you will rewrite that URL, when requested by the client, to your script filepath at
/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=ANNE+ARUNDEL&city=ANNAPOLIS
&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=Detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC
After that works, and only after, should you attempt the third step in the thread I cited (i.e. your second rule).
This subject isn't simple, and unfortunately, there's a steep learning curve. Misconceptions and misunderstandings are quite common.
The character-casing 'test' I proposed is to manually request
http://www.example.com/index.php?cur_page=1&pclass[]=1&pclass[]=1&County=anne+arundel
&city=annapolis&ForSale=Y&ListPrice-min=675000&ListPrice-max=1000000&Type=detached
&waterfront=Y&action=searchresults&sortby=ListPrice&sorttype=ASC
Note that the casing of the query values corresponds not to that used in the current dynamic URL, but to the pretty, search-engine-friendly URL you wish to use. We are testing whether your script will accept it -- or just throw an error because it requires an exact match.
If that works, then the only obvious obstacle is getting your RewriteRule to replace "-" with "+" where required. This will be slow and inefficient, but nowhere nearly as bad as doing case-conversion on the entire URL-path.
If it doesn't work -- i.e. your script is case-sensitive -- then you're going to need a scripted approach with database access (and possibly database modifications) to solve this problem.
Jim
http://www.example.com/maryland/anne-arundel-county/annapolis/home-for-sale
-675000-1000000-detached-waterfront-cur_page1.html
I'd also like to add that a URL like this is waaay too long and has waaay too many hyphens in it. I think you're trying waaay too hard to stuff keywords in here; and so will Google.
I'd likely not want something any more complicated than:
http://www.example.com/maryland/anne-arundel/annapolis/675000-1.html but I could see that your site would still have a huge taxonomy with most folders being almost empty. If the individual properties are listed in multiple categories, I'd have pages like http://www.example.com/maryland/ listing counties, and pages like http://www.example.com/maryland/anne-arundel listing properties, but with the individual properties having URLs like http://www.example.com/675000-1.html, but that's a whole other discussion.
So I studied this some more and followed the suggestions. I tested the dynamic link manually as Jim suggested with the parameters in lower case and this worked fine so I guess there is not issue with case here.
I then completed the first step as follows
# Enable mod_rewrite, start rewrite engine
Options +FollowSymLinks
RewriteEngine on
#
# Internally rewrite search engine friendly static URL to dynamic filepath and query
RewriteRule ^maryland-([^/]+)/([^/]+)/([^/]+)-([^/]+).html?$ /index.php?pclass[]=$1&County=$2&city=$3&action=searchresults&pclass[]=$4 [L]
This works when I test a static link like this:
http://www.example.com/maryland-1/howard/columbia-1
But when I tried the next step, I get a 500 server error
#
# Externally redirect client requests for old dynamic URLs to equivalent new static URLs
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\?pclass[]=([^&]+)&County=([^&]+)&city=([^&]+)&action=searchresults&pclass[]=([^&]+)\ HTTP/
RewriteRule ^index\.php$ http://www.example.com/maryland-%1/%2/%3-%4.html? [R=301,L]
I've read over the documentation and some posts but I'm still not sure where I'm going wrong.
The most likely problem in your second rule which could cause a 500 Error is that brackets "[]" are special characters in regular-expressions patterns, and are never empty when used for this regex function. So, in the parts of your pattern reading "pclass[]=" you will need to escape those brackets to make them literals. Use "pclass\[\]=" instead, and see if that helps. And note that this "pclass[]=" parameter appears twice in your query, which I can only say is "odd," if not incorrect and potentially problematic.
BTW, the first stop when getting a 500-Server Error is your raw server error log file -- It most often contains very good clues as to the cause of the problem.
Jim
So now my next issue is this. These results pages produce multiple pages with Prev and Next links to page throught the results. The first static link now looks like this and works great:
http://www.example.com/maryland-1/howard/columbia.html
However, when I click Next, I get a 404 because the link now looks like this:
http://www.example.com/maryland-1/howard/index.php?cur_page=1&pclass[]=1&County=howard&city=columbia&action=searchresults&sortby=ListPrice&sorttype=ASC
I'm assuming i need additional rules to handle cur_page=1, cur_page=2, etc. It looks like it keeps everything up to the county parameter then the rest is the same type of dynamic string. So will my rewrite take the part starting with index.php? and do a similar rule that rewrites everything from there to make it look something like
example.com/maryland-1/howard/columbia-cur_page_2.html
Yes, I think you can do what you propose. Hopefully, you won't run into the other bug-a-boo with pagination functions -- where the query string parameter order changes in several ways, and you have to cover all cases. This problem *can* be solved, but it can sometimes be a real pain in the hindquarters.
When making the friendly pagination URLs, keep them short; Don't include those underscores unless you really need them! (Most of us here on WebmasterWorld avoid underscores like the plague, because they ate not treated as 'word breaks' by most search engines, and because they 'hide' beneath the standard on-page link-underline and can't be differentiated visually from spaces.) I suspect you could shorten 'cur_page_2" all the way down to "pg2" without causing ambiguity problems in writing your rules.
When adding these new pagination-support rules, just remember: Put all external redirects first, in order from most-specific patterns and conditions to least-specific, followed by all internal rewrites, again in order from most- to least-specific.
Jim
Assuming that I'm successful, do I understand correctly that I will also have to change the HTML in the software script files so that the SE friendly URLs are invoked in the browser before the request hits the server? Otherwise there will be duplicate content issues?
Thanks Jim!
The links on your pages *define* URLs. It matters not whether those URLs actually resolve to existing domains or to existing 'pages' on those domains -- The act of publishing a link is what 'creates' the URL, and it will get spidered by search engines whether it resolves to a 'file' or not, and whether you consider it to be 'optimized' or not.
It's often useful to think of URLs as independent entities, and quite necessary to think of them as something utterly different from "files." While this distinction may only become clear when working with mod rewrite or similar functions that change the default server URL-to-filename mapping, it is true even if you do not use mod_rewrite or similar. After all, the primary function of an HTTP server is to map requested URLs in the form "http://www.example.com/main-dir/foo.php" to server filepaths in the form required by that server's operating system -- say, "C:\Program Files\apache\var\users\my-site\public\html\main-dir\foo.php"
Even in the absence of any Webmaster-specified URL-rewriting. it's clear that these are two very-different "location-specification" methods; Only the very-last bit of those two strings has anything in common.
And that in fact is why URLs were invented: So that agents on the Web (and Webmasters creating links to other sites) don't have to know the internal file-structure of each and every server that they want to request resources from, and so that server administrators and Webmasters would be free to re-locate directories, re-arrange the entire internal server file structure, change the server software, and even change the entire operating system (e.g. change Windows to Linux or vice-versa) if necessary.
Jim
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^properties-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L]
Rewriterule ^gites-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults展on=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&cur_page=$7 [L]
Rewriterule ^properties-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^properties-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&dept=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&date1=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date2=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&sleeps-min=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&pool=$6&cur_page=$7 [L]
Rewriterule ^rentals-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&city=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&cur_page=$7 [L]
Rewriterule ^rentals-upto-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-max=$1&date1=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&date1=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date2=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&sleeps-min=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&date2=$4&pool=$5&cur_page=$6 [L]
Rewriterule ^rentals-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults&price-min=$1&price-max=$2&date1=$3&date2=$4&sleeps-min=$5&cur_page=$6 [L]
My question is...would it be reasonable to think I could modify these rules based on what I've learned here to make them more efficient and safe? Do you see any other issues that might damage rankings? One thing I noticed is there is no 301 redirect. The documentation says it does this so I need to find out how this is handle.
This would be a great solution if it's safe for the site.
Rewriterule ^gites-in-(.*)-from-(.*)-pricemax-(.*)-dates-start-(.*)-dates-end-(.*)-sleeping-(.*)-with-swimming-(.*)-page([0-9]*)\.html index.php?action=addon_sefTiger_searchresults®ion=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L] Rewriterule ^gites-in-([^\-]+)-from-([^\-]+)-pricemax-([^\-]+)-dates-start-([^\-]+)-dates-end-([^\-]+)-sleeping-([^\-]+)-with-swimming-([^\-]+)-page([0-9]*)\.htm[b]l$[/b] index.php?action=addon_sefTiger_searchresults®ion=$1&price-min=$2&price-max=$3&date1=$4&date2=$5&sleeps-min=$6&pool=$7&cur_page=$8 [L] However, there's a caveat: The highest back-reference number allowed in mod_rewrite is $9, so this could become problematic if more than one field in this example URL needed to be allowed to contain one or more hyphens. In that case, the rule would need to be broken into two or more 'steps', each allowing up to nine back-references. The code can get ugly pretty fast in that case, and it can be more complex than it might sound, but it's do-able.
Note also that none of the patterns in those rules appears to be end-anchored. This only exacerbates the inefficiency of their code. I added the bolded end-anchor in my modified example.
Just taking a guess, I'd say the simple pattern modifications shown would speed up the rule processing by a factor of at least 2000 -- probably more, but I don't feel like counting the characters and subpatterns and working out the required factorials...
I would suspect that the 301 redirects are being done in plug-in script itself. Look for code that outputs a response Status of 301 and a Location header.
One more thing: It appears that this code was copied at some point through an "HTML transcription" that has 'broken' the patterns and URLs for "®ion" and turned them into "<registered-tradmark-symbol>gion". Just beware of this if you're copying the code to modify it.
Jim
Tracey
The "multiple-(.*) pattern" problem is probably the leading cause of otherwise-unnecessary server upgrades. I can't begin to imagine how many folks are forced to upgrade to high-end dedicated servers simply because they never realized how inefficient that multiple-(.*) technique can be.
Of course, if one has never written code intended to parse character strings, then the problem isn't obvious. But it requires a geometrically-exploding number of matching attempts as the number of "(.*)" subpatterns grows, and the number of operations required within each matching attempt increases with the length of the input string from the current matching position to the end. Then you replace the "(.*)" patterns with "([^\-])" or "([^/]+)" as needed, and the matching engine then only has to make one single left-to-right matching attempt. Often a stunningly-huge difference in performance results -- I know personally of one case where the server took over five seconds to serve a request with "(.*)" patterns (in hundreds of rules). When the subpatterns were 'corrected,' the server response time dropped to 'negligible.'
But WordPress, Joomla, and many plug-in vendors have plodded on, releasing "recommended" code like that. Luckily, it seems that some of them are catching on. After you've successfully tested your modified rules, you should inform the plug-in maker of your improvements (and ask for a heavy discount or even a freebie on the plug-in in exchange). :)
Jim