Forum Moderators: phranque

Message Too Old, No Replies

Appending a query AFTER .htm

from the "php" forum

         

Josefu

7:23 am on Apr 20, 2005 (gmt 0)

10+ Year Member



I originally postid this in the PHP forum but perhaps it is beter here - at least I think so now : )

I've been mulling and reading and mulling and reading and mulling again over this one for the best part of an afternoon.
All my in-site links are set up (with the help of Apache rewriteRule) as "clean" queries and links, but I've run into a need to deviate "back" to a query string add-on. I've managed to compile and display an alphebetical index of all historical terms appearing in my site, and I've arranged it so that the term clicked upon will be highlighted once the page it's linked to is loaded, but I don't want to "hard-link" them for fear of too much multiple content. What I mean is, if I do a /path/to/the/page/termToHighlight.htm the SE would think it's a unique page.

yet when I try to do a rewriteRule (simplified) :
^/(.*)/(.*)/(.*)\.htm\?TackQuery=(.*)$ $1.php?query1=$2&query2=$3&TackQuery=$4

...it ignores the query after the .htm

Does anyone know of any way around this? I'm stuck.

Thanks in advance for any help at all.

jdMorgan

5:03 pm on Apr 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's not seeing Tack_Query because you cannot access the query string using RewriteRule.

If this query string is already appended to the requested URL in its correct form, you can use:


RewriteRule ^/(.*)/(.*)/(.*)\.htm$ $1.php?query1=$2&query2=$3 [QSA,L]

and if not, then use:

RewriteCond %{QUERY_STRING} ^TackQuery=(.+)$
RewriteRule ^/(.*)/(.*)/(.*)\.htm$ $1.php?query1=$2&query2=$3&TackQuery=%1 [L]

By way of explanation, a query string is not technically part of a URL; Rather it is data appended to a URL to be passed to the resource located at that URL. For this reason, it is not not visible to RewriteRule, and is not available in the server's {REQUEST_URI} variable.

BTW, I strongly suggest that you do not use ".*" as a pattern when it can be avoided. It is highly inefficient when used in patterns containing multiple sub-patterns. This is because ".*" is both promiscuous and greedy; It wlll match as many characters as possible.

For example, in your rule, the regex parser will have to make multiple passes to get a match. On the first pass, it will attempt to match the entire requested URL into the first "(.*)", making $1 equal to the entire URL. After that fails, it will try to match all but the last character in the URL to your pattern. That also will fail, as will many more attempts, until finally, the parser has worked its way backwards from the end of the requested URL to the second slash. This will make the first "(.*)" match. But then the entire process will repeat again on the second and third paramters.

Therefore, I would strongly suggest re-coding the above rule as:


RewriteCond %{QUERY_STRING} ^TackQuery=(.+)$
RewriteRule ^/([^/]+)/([^/]+)/([^.]+)\.htm$ $1.php?query1=$2&query2=$3&TackQuery=%1 [L]

in order to speed things up. In each case, the pattern will match one or more characters up to the next delimiter of "/" or "." in the URL. In this way, we proceeed from left to right throught the requested URL in a single pass, and avoid the recursive pattern matching needed to match a string with multiple ".*" patterns in it. Each subpattern is "better" because it is looking for a specific "stop" character.

In some cases, the use of ".*" is unavoidable. But avoid it whenever possible.

Jim

Josefu

5:23 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



(tipping hat) You've saved me a lot of trouble yet once again - thanks a million! If ever you're in my neck of the woods you've got more than a beer on me : )

I will run to apply right away - then to read everything around and about the solution you gave me. So much to learn...

Thanks again, take care,

Josefu.

[added] noted noted about the ".*" - that was but a remnant from a last desperate measure-only for the representation of the content of the "tackQuery". Everything before is strict ([enfr]{2})[a-zA-Z][^/]etc but thanks for your concern. I'm quite precise in my rewrites today and that may also be thanks to something you wrote here. You contribute quite a lot! Thanks again again : )[/added]

[added added]Up and working, all index terms are now highlighted in the page that contains them. Thanks! [/added added]

Josefu

9:34 am on Apr 26, 2005 (gmt 0)

10+ Year Member



Backfired!

Unfortunately all the above seems to have been for nothing - Google has listed every entry in my "alphabetical index" page as a unique URL - yet the fact that that url has a query string added to it is quite clear. I now have around 50 google listings pointing to a unique text page - should I fear a "duplicate content" penalty? Is there some way I can keep Google from reading the query string?

Perhaps this is a subject for the Google board. Search time.

[added] Search seems to be down? [/added]

Josefu

6:47 am on Apr 28, 2005 (gmt 0)

10+ Year Member



I haven't been able to find a solution for the problem - if problem there is. What's more, I'm not sure where to post my question - but no need to bother google news for now.

The thing is - do you think that, through the above mod_rewrite, the "added query" somehow was seen as a hard-coded url. No, wait, that's not possible.

The thing is, I've got around 60 links from one page pointing to one other - albeit with a different query string - and I'm not sure how google will see this.

Any - er, reassurance would be welcome.

jd01

9:06 am on Apr 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure what you're saying...

Are the pages indexed uniquely? EG when you click on a link, do you go to the correct content, or do they all go to the same content?

Do you want to redirect the dynamic pages to unique static URL's?

Not exactly sure I understand what you're trying to accomplish...

Justin

BTW anything is possible when it comes to a SE, that's why this little game is so fun... What was ok yesterday will get you on the blacklist today.

Josefu

12:23 pm on Apr 28, 2005 (gmt 0)

10+ Year Member



Thanks for your reply.

What I've done use mod_rewrite to rewrite a hard-coded "html" url into a php URL with queries. Quite typical, this. But in addition to the above, on links to the above same page from another "alphabetical index" page, I appended the word to be highlighted to the end of it - as a query - instead of "hard coding", it too, into the url, and all this to avoid the situation where I'd have reams of hard-coded links leading to the same content - making a ripe case for a multiple-content penalty.

[added]

instead of doing this: /path/to/page/wordtohighlight.htm
I did this : /path/to/page.htm?highlight=wordtohighlight

[/added]

Since this morning (had to leave) I had little more time than to do a search for "htm?id=" - but this turned up a fair amount of URL's like mine. I think that Google sees the query as a query - at least I'm hoping so until I know better.