Forum Moderators: phranque
I've been mulling and reading and mulling and reading and mulling again over this one for the best part of an afternoon.
All my in-site links are set up (with the help of Apache rewriteRule) as "clean" queries and links, but I've run into a need to deviate "back" to a query string add-on. I've managed to compile and display an alphebetical index of all historical terms appearing in my site, and I've arranged it so that the term clicked upon will be highlighted once the page it's linked to is loaded, but I don't want to "hard-link" them for fear of too much multiple content. What I mean is, if I do a /path/to/the/page/termToHighlight.htm the SE would think it's a unique page.
yet when I try to do a rewriteRule (simplified) :
^/(.*)/(.*)/(.*)\.htm\?TackQuery=(.*)$ $1.php?query1=$2&query2=$3&TackQuery=$4
...it ignores the query after the .htm
Does anyone know of any way around this? I'm stuck.
Thanks in advance for any help at all.
If this query string is already appended to the requested URL in its correct form, you can use:
RewriteRule ^/(.*)/(.*)/(.*)\.htm$ $1.php?query1=$2&query2=$3 [QSA,L]
RewriteCond %{QUERY_STRING} ^TackQuery=(.+)$
RewriteRule ^/(.*)/(.*)/(.*)\.htm$ $1.php?query1=$2&query2=$3&TackQuery=%1 [L]
BTW, I strongly suggest that you do not use ".*" as a pattern when it can be avoided. It is highly inefficient when used in patterns containing multiple sub-patterns. This is because ".*" is both promiscuous and greedy; It wlll match as many characters as possible.
For example, in your rule, the regex parser will have to make multiple passes to get a match. On the first pass, it will attempt to match the entire requested URL into the first "(.*)", making $1 equal to the entire URL. After that fails, it will try to match all but the last character in the URL to your pattern. That also will fail, as will many more attempts, until finally, the parser has worked its way backwards from the end of the requested URL to the second slash. This will make the first "(.*)" match. But then the entire process will repeat again on the second and third paramters.
Therefore, I would strongly suggest re-coding the above rule as:
RewriteCond %{QUERY_STRING} ^TackQuery=(.+)$
RewriteRule ^/([^/]+)/([^/]+)/([^.]+)\.htm$ $1.php?query1=$2&query2=$3&TackQuery=%1 [L]
In some cases, the use of ".*" is unavoidable. But avoid it whenever possible.
Jim
I will run to apply right away - then to read everything around and about the solution you gave me. So much to learn...
Thanks again, take care,
Josefu.
[added] noted noted about the ".*" - that was but a remnant from a last desperate measure-only for the representation of the content of the "tackQuery". Everything before is strict ([enfr]{2})[a-zA-Z][^/]etc but thanks for your concern. I'm quite precise in my rewrites today and that may also be thanks to something you wrote here. You contribute quite a lot! Thanks again again : )[/added]
[added added]Up and working, all index terms are now highlighted in the page that contains them. Thanks! [/added added]
Unfortunately all the above seems to have been for nothing - Google has listed every entry in my "alphabetical index" page as a unique URL - yet the fact that that url has a query string added to it is quite clear. I now have around 50 google listings pointing to a unique text page - should I fear a "duplicate content" penalty? Is there some way I can keep Google from reading the query string?
Perhaps this is a subject for the Google board. Search time.
[added] Search seems to be down? [/added]
The thing is - do you think that, through the above mod_rewrite, the "added query" somehow was seen as a hard-coded url. No, wait, that's not possible.
The thing is, I've got around 60 links from one page pointing to one other - albeit with a different query string - and I'm not sure how google will see this.
Any - er, reassurance would be welcome.
Are the pages indexed uniquely? EG when you click on a link, do you go to the correct content, or do they all go to the same content?
Do you want to redirect the dynamic pages to unique static URL's?
Not exactly sure I understand what you're trying to accomplish...
Justin
BTW anything is possible when it comes to a SE, that's why this little game is so fun... What was ok yesterday will get you on the blacklist today.
What I've done use mod_rewrite to rewrite a hard-coded "html" url into a php URL with queries. Quite typical, this. But in addition to the above, on links to the above same page from another "alphabetical index" page, I appended the word to be highlighted to the end of it - as a query - instead of "hard coding", it too, into the url, and all this to avoid the situation where I'd have reams of hard-coded links leading to the same content - making a ripe case for a multiple-content penalty.
[added]
instead of doing this: /path/to/page/wordtohighlight.htm
I did this : /path/to/page.htm?highlight=wordtohighlight
[/added]
Since this morning (had to leave) I had little more time than to do a search for "htm?id=" - but this turned up a fair amount of URL's like mine. I think that Google sees the query as a query - at least I'm hoping so until I know better.