Forum Moderators: phranque
This rewrite rule:
RewriteRule ^([0-9a-z-]+)-([a-z]+)-([0-9]+)-([0-9]+)-([0-9]+)-([0-9]+)-([a-z]+)-([0-9]+).html$ multimedia.php?description=$1&action=$2&city=$3&cid=$4&pid=$5&album=$6&idm=$7&start=$8 [NC,L]
works fine for:
http://www.example.com/hotels-apartaments-hostels-united-kingdom-cat-0-13-0-0-en-0.html
So, I can't understand how Apache knows where the first variable ends. Is Apache reading rewrite rules from right to left?
I hope that you can understand what I mean.
As I said, this rule works fine and I don't want to change it. I opened this thread just to make sure that there is nothing wrong with this rewrite rule and I can leave this rewrite rule as it is.
[edited by: jdMorgan at 12:52 pm (utc) on Sep. 29, 2009]
[edit reason] example.com [/edit]
It will initially match the entire request-URL into the first subpattern, then fail to get a match on the second subpattern. So it will 'back off' one character and try again, fail, back off and fail again, etc. until it gets a match on both the first and second subpattern. But then it will fail on the third subpattern, so it will again start the back-off-and-retry, one character at a time, until if can match the first through third subpatterns. But then it will fail on the fourth subpattern... I trust you can see how this continues until all subpatterns are matched, and that you also realize how horribly inefficient it is...
For best performance and to avoid an early server upgrade, you should avoid using any character as your parameter-delimiter that might also appear in the parameters themselves.
I would not recommend leaving this situation as-is.
You should also escape the period preceding 'html' with a backslash, as otherwise it means "match any single character" and is therefore ambiguous. Use "\.html"
Jim
Thank you.
RewriteRule ^multimedia/([0-9a-z-]+)/([0-9a-z-]+)/([a-z]+)-([0-9]+)-([0-9]+)-([0-9]+)-([0-9]+)-([a-z]+)-([0-9]+)\.html$ multimedia.php?description=$1&country=$2&action=$3&city=$4&cid=$5&pid=$6&album=$7&idm=$8&start=$9 [NC,L] ----------------------------
[mydomain.com...]
It looks much better and the following depends on your exact application, but you might be able to shorten and speed it up a bit, by matching anything except a /, followed by a / EG
RewriteRule ^multimedia/([^/]+)/([^/]+)/([a-z]+)-([0-9]+)-([0-9]+)-([0-9]+)-([0-9]+)-([a-z]+)-([0-9]+)\.html$ multimedia.php?description=$1&country=$2&action=$3&city=$4&cid=$5&pid=$6&album=$7&idm=$8&start=$9 [NC,L]
The pattern [^/] means:
^ = Is Not
/ = the pattern that Should Not be matched.
If you do not need to check to see if the input is all letters / numbers, you might be a bit faster using [^-] in the hyphenated section, but it really depends on your exact application and the ambiguity you are will to allow for URL input.
.htaccess is not like PHP, where you have to worry too much about what it receives. Either the pattern matches or doesn't, and since you are probably checking all information being passed to PHP within the PHP itself, there's no need for the 'double check'. If you are not checking all variables passed into PHP, then I highly recommend you do, because it's where the security issue is most likely to arise if there is one.
You cannot 'break into' a site through mod_rewrite like you can with a 'scripting language'.
Expanding on the "security" aspect a little, I should point out that if the above is true, then you have a "competitive security problem."
Assuming that I'm the Webmaster for your most serious compititotr what is to stop me from throwing up a bunch of links to the web to "www.your-domain.com/sleazy-rat-infested-hotels-apartaments-hostels-united-kingdom-cat-0-13-0-0-en-0.html" and "www.your-domain.com/lice-infestations-at-hotels-apartaments-hostels-united-kingdom-cat-0-13-0-0-en-0.html" with apt descriptive text in the links, and making these show up as valid links to your site in search results?
You must pass and check *all* of the parameters against your database. If the description and country do not *exactly* match (character-for-character) the expected values given the action, city, cid, pid, album, idm, and start values (as applicable), then pull the correct values from your database, and force a 301-Moved Permanently redirect to the correct URL.
That handles malicious linking. You may also wish to apply some level of "intelligent analysis" to requested URLs which do not resolve, keying off the most-important combinations of parameters to try to find a match if the URL isn't quite correct. For example, it's quite common to get requests for URLs which are followed by periods, commas, quote marks, spaces, exclamation points, and the HTML tag closing character ">" due to faulty links or faulty auto-linking code in forums, blogs, etc. And of course, there are always typos and common spelling errors if a link is manually entered.
As a simple example, you might get a request for www.example.com/hotels-apartaments-hostels-united-kingdom-cat-0-13-0-0-en-0.html." because a forum auto-linked a URL, wrongly including the period at the end of a sentence.
Sometimes, the "security hole" isn't where you most expect it to be...
Jim
Some time ago I was thinking about this problem. I checked many websites, including large and serious websites. Many of them have descriptive teksts in URLs which can be changed to any text, and the website still displays the same content because these descriptions are not checked against database. So, I decided then that it's not a problem at all. But you are right, the compitirors may create bunch of links which will show the same content.
Now I am going to change my scripts so that php script checks all the parameters. If it will be not possible (some texts are from language files, not the database) I will remove descriptions from all URLs.
Is it a big pluss in the eyes of search engines to have some descriptive teksts in URLs if I have Titles and Descriptions in the <heads> ?
I have checked my website and it's not possible to open any content URL is followed by additional charachers like html¦ or html.... The browser displays 404 page.
All other parameters (except 2 descriptions) are checked by my php scripts.
If it is possible to put the link on one of your pages, then it is possible to check that link-text...
> Is it a big plus in the eyes of search engines to have some descriptive teksts in URLs if I have Titles and Descriptions in the <heads> ?
Yes, the text in the URL is important as both a ranking factor and as an 'eyeball-catcher' in the search results -- remember that words matching the search terms will be bolded in the search results.
I *would not* remove the descriptive text from your links -- Find a way to check it.
Jim
This assumes that the server is properly configured; It is easy to make a mistake when declaring custom error documents that results in a 302-Found response, and this happens quite frequently to Webmasters who don't read the ErrorDocument directive documentation carefully and/or who do not test their error responses with a server headers checker. It's the kind of mistake where not spending 3 minutes reading (or not understanding what was read) can cost millions of dollars and hundreds of jobs, and unfortunately, it happens fairly often.
Jim
[games.yahoo.com...]
[games.yahoo.com...]
[games.yahoo.com...]
[games.yahoo.com...]
About 404 pages. My question was not correct. I wanted to ask about errors which return scripts.
For example:
[domain.com...]
So, if id #5 is deleted, the script will show some info telling that this id doesn't exist. It will be not a real 404 page.
Yes, well when your pages out-rank the Yahoo Games pages in search, then you won't need to worry about wasted ranking factors due to duplicate-content problems. How many hundreds of PR8, PR9, and PR10 inbound links do you have? Got any to spare? :)
Sometimes, looking at 'big Web sites' is not an appropriate thing to do, unless your site is also a 'big Web site'...
Jim