homepage Welcome to WebmasterWorld Guest from 54.227.146.68
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Dynamic URL Rewrite - RewriteCond %{QUERY STRING}
RewriteCond QUERY_STRING for index.php?page=somepage
hideaky




msg:4316756
 8:02 am on May 24, 2011 (gmt 0)

Hi all!

Could someone please help me on how to rewrite RewriteRule for query string "?page=otherpage"?

I tried this:
RewriteCond %{QUERY_STRING} page=index
RewriteRule ^index.php$
http://www.domain.com/$1?
[R=301]

It works for index.php, so the url path will only show
http://www.domain.com/
, instead of
http://www.domain.com/index.php?page=index


But what if RewriteCond %{QUERY_STRING} page=otherpage?
How to write the RewriteRule for the above RewriteCond?

The current path is
http://www.domain.com/index.php?page=otherpage
.
I want it to be read as:
http://www.domain.com/otherpage


Thanks in advanced!

hideaky

 

jdMorgan




msg:4318874
 1:17 am on May 28, 2011 (gmt 0)


RewriteCond %{QUERY_STRING} ^([^&]*&)*page=([^&]+)
RewriteRule ^index.php$ http://www.domain.com/%1? [R=301,L]

This assumes that your pages are no longer dynamically-generated and no longer link to other pages by using the "page=" query string, so you're trying to get rid of the old URLs listed in search results.

If this is not the case, see the thread "Changing Dynamic URLs to Static URLs" in our Apache Forum Library. That (very-detailed) thread describes the three-step process needed to use "static-looking" URLs on a dynamically-generated site.

Jim

digiweb




msg:4342064
 6:48 am on Jul 21, 2011 (gmt 0)

I have a similar problem. I am trying to get rid of a specific way of generating pages that is no longer in use but is still indexed in search results.

One of these query strings for example is -

index.php?option=com_dtregister&controller=validate&task=uniqueUser&no_html=1

Another is
index.php?option=com_dtregister&Itemid=3


Following your example I got this to work

RewriteCond %{QUERY_STRING} ^([^&]*&)*option=com_dtregister
RewriteRule ^index.php$ http://www.example.com/%1? [R=301,L]


But what I'd really like to do is apply [G] so it returns a 410. At least I think that's what I want. I want Google to know that it should forget anything with com_dtregister.

Do you think [G] is the right approach, and if so, how would that work in this case?

lucy24




msg:4342069
 7:19 am on Jul 21, 2011 (gmt 0)

Is there any possibility of humans following these outdated links, or is this strictly for search engines? For humans, the 410 is definitely a last resort; you'd really like them to land somewhere. If the page still exists in query-free form, just chop off the query string and let people land on the "base" page.

But for search engines, if it's a choice between returning a 410 and redirecting a bunch of different urls to the same place, go with the, er, Gone. In mod_rewrite they are set up exactly like [F]:

RewriteRule {any old stuff here} - [G]

Since you're applying it to everything that has a particular query string, you don't need to have anything in particular in the rule. Maybe \.php$ to be tidy about it.

In the %{QUERY_STRING} part, if you want to exclude everything that contains the com_dtregister element, and you're not capturing the rest, you don't need the
^([^&]*;)*option=
part. (Hm. Seems to be a rash of unwanted smileys lately. One way to keep them out is to wrap everything in "code" tags ;)) Just write out the part of the query string that you do want to look for.

digiweb




msg:4342092
 9:09 am on Jul 21, 2011 (gmt 0)

No one needs these pages. com_dtregister didn't work. We moved on to another program. I want google to stop carrying around these old links. They're diluting our keyword density, too.

I'm lost on the syntax. Would it be possible to provide a real-world syntax-correct example?

thank you

g1smd




msg:4342281
 4:26 pm on Jul 21, 2011 (gmt 0)

Is that a Joomla site?

Make sure you have updated to the very latest "htaccess.txt" file from the newest Joomla install set.

The new file is compatible with all older versions of Joomla but contains important code changes.

digiweb




msg:4342289
 4:38 pm on Jul 21, 2011 (gmt 0)

joomla 1.5.23
Just can't get the syntax.

lucy24




msg:4342385
 7:02 pm on Jul 21, 2011 (gmt 0)

com_dtregister didn't work

Did you leave off the anchors? The two elements
^(.*)stuffhere(.*)$
and
stuffhere
are functionally the same in most situations, but the short version generally runs faster and cleaner.

digiweb




msg:4342423
 7:53 pm on Jul 21, 2011 (gmt 0)

I'm sorry without a full syntax you'd put in an .htaccess file I don't know how to follow what you've posted. Syntax is a bear.

lucy24




msg:4342496
 10:46 pm on Jul 21, 2011 (gmt 0)

Syntax is a bear.

And there you have htaccess in a nutshell. Full version: You said "com_dtregister didn't work". Did your complete line say

RewriteCond %{QUERY_STRING} ^com_dtregister

or did it say

RewriteCond %{QUERY_STRING} com_dtregister

The first version would only work for query strings that begin with "com_dtregister"-- in other words, never. The second version will work for anything that contains "com_dtregister".

If you are using [G] you don't rewrite at all, you just put in a - (hyphen) [httpd.apache.org] meaning "don't change the original":

RewriteRule .+ - [G]

Did someone else just ask this identical question? What's the apache page doing so near the top of my browser history?

digiweb




msg:4342507
 11:36 pm on Jul 21, 2011 (gmt 0)

I had:
RewriteCond %{QUERY_STRING} ^([^&]*&)*com_dtregister=([^&]+)
RewriteRule ^index.php$ [domain.com...] [R=301,L]

Then to make it a [G] I changed it to:

RewriteCond %{QUERY_STRING} ^([^&]*&)*com_dtregister=([^&]+)
RewriteRule - [G]

So I believe you're saying I could do this:
(and yes anything with com_dtregister is to be wiped out)

RewriteCond %{QUERY_STRING} com_dtregister
RewriteRule - [G]


Which is better to clean cruft out of Google Cache? 301 or 410?

I was worried the 301 was leaving the links but creating a massive duplicate content at the target (home page). So if I gave Google a 410 eventually it would let go of the idea that the links are ever going to be meaningful.

Does this sound reasonable?

g1smd




msg:4342515
 12:20 am on Jul 22, 2011 (gmt 0)

Both 301 and 410 remove the URL from the index.

301 seamlessly takes the visitor someplace else so you retain the traffic. 410 shuts the visitor out and encourages a bounce unless you put enticing links on your 410 error page.

lucy24




msg:4342516
 12:21 am on Jul 22, 2011 (gmt 0)

Which is better to clean cruft out of Google Cache? 301 or 410?

If I knew the answer to that, I would be rich :)

I was worried the 301 was leaving the links but creating a massive duplicate content at the target (home page). So if I gave Google a 410 eventually it would let go of the idea that the links are ever going to be meaningful.

Does this sound reasonable?

That's assuming for the sake of discussion that "sounds reasonable" and "what google would do" are the same thing. A 410 is better than a 404, and a 301 is better than a 302. No matter what you do, google will periodically visit the old links just to see if maybe you decide in 2023 to reactivate them. But in the long term, 410 is best.

Edit after seeing intervening post: I think you said earlier that this is all about search engines, not human visitors. It's always easier when you only have to think about one or the other :)

lucy24




msg:4342609
 7:51 am on Jul 22, 2011 (gmt 0)

Oops, time ran out for editing. Turns out there may be an entirely different alternative solution if your main concern is g###. That was a rhetorical statement. If you can bring yourself to sign up for Google Webmaster Tools, there's a section on "URL parameters" under "site configuration". (Found it while looking for something else, naturally.)
Only use this feature if you feel confident about how parameters work for your site. Telling Googlebot to exclude URLs with certain parameters could result in large numbers of your pages disappearing from our index.

So if you want "large numbers of your pages disappearing from our index" this would seem to be just the ticket :)

g1smd




msg:4342623
 8:39 am on Jul 22, 2011 (gmt 0)

While that solution works, it is only good for Google. Configuring the site correctly is better. The fix will then work for all visitors and bots.

digiweb




msg:4342671
 11:51 am on Jul 22, 2011 (gmt 0)

Great discussion. Thanks again.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved