Forum Moderators: phranque

Message Too Old, No Replies

Redirect to remove question mark appended to URL

Showing as duplicate in WMT, but fix kills ads on my site!

         

AndyA

5:31 pm on Jun 24, 2010 (gmt 0)

10+ Year Member



WMT showed a duplicate page title on my site when I checked this morning. It listed the correct link on my site as:

http: //example.com/pageonmysite.html

The duplicate was listed as:

http: //example.com/pageonmysite.html?referrer=anothersiteexample.com.au

After searching WebmasterWorld, I found this code:

RewriteCond %{THE_REQUEST} \?[^\ ]*\ HTTP/
RewriteRule (.*) http://example.com/$1? [R=301,L]

That worked beautifully to do a 301 redirect to the correct URL, but it also prevented my OpenX ads from appearing!

I've searched, and cannot find how to do this while allowing OpenX to still display. Any help would be appreciated, this htaccess stuff confuses the me to no end.

jdMorgan

5:46 pm on Jun 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you have to define "What query strings are bad?" (bogus-listing queries) and "What query strings are good?" (OpenX) before you can start coding...

Whichever of these is easier to define in terms of a regular-expressions-recognizable pattern is what you should use -- The "bad" ones as a postive-match pattern, or the good ones as a negative-match patterns, to either invoke or inhibit the redirect, respectively.

Note that an HTML page would not normally accept a query string, so that may be a good 'shortcut' to a solution. But only you know if removing all queries from all .html URLs would affect your site negatively.

Thoroughly-define requirements and specifications first, code later.

Jim

AndyA

5:59 pm on Jun 24, 2010 (gmt 0)

10+ Year Member



Jim:

Thanks for the response. I understand what you're saying, but I have no idea how to make it happen. I'm not a coder, not at all.

I'm guessing it would be easiest to just allow the OpenX, and prevent all others?

jdMorgan

6:05 pm on Jun 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't know -- I have no idea whatsoever of what an openX query string might look like.

You need to get very specific, here. What queries do you want to accept, on what kinds of pages, and/or what queries do you want to reject, and what kinds pages don't use queries at all?

The solution is based on the character strings in the URLs and the querystrings appended to those URLs.

It has nothing to do with coding, but rather with list-making and text-pattern-recognition.

Only you are in a position to provide this information, and the quality of that information will determine success; You will get precisely what you ask for, and that may or may not be what you need -- That outcome is up to you...

Not meaning to be pedantic, but we cannot 'see' your server, and we're not mind-readers here... :)

Make lists and compare them.

Jim

AndyA

6:21 pm on Jun 24, 2010 (gmt 0)

10+ Year Member



Jim:

This is what I've used that seems to be working:

RewriteCond %{THE_REQUEST} \?[^\ ]*\ HTTP/
RewriteCond $1 !^openxfile/
RewriteRule (.*) http://example.com/$1? [R=301,L]

The ads are appearing, and the URL with the ? in it is now redirected with a 301 to the correct URL, which should make Google happy.

As far as I can recall, the only thing I have on my site that uses a ? in the URL is OpenX. I had a forum, but it's basically shut down now. I guess when I upgrade and reopen the forum, I'll have to make an exception for that file as well in the code.

I read about some mobile phones adding a question mark to all URLs, but it seems like this code should address that as well, and remove the question mark and redirect to the correct URL without the question mark.

I hope this takes care of it, I'll keep an eye on it for the next few days to make sure I haven't created another problem by doing this. Thanks for your help.

g1smd

7:18 pm on Jun 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the path "/openx" is what defines the ads, then you have indeed solved the problem.

That was all that was asked. Define here what it is that is in the URL that makes it an ad URL, as opposed to some other unwanted URL.

jdMorgan

10:51 pm on Jun 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Let's do make it a bit more efficient and specific, though...

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^openxfile/
RewriteRule ^(.*)$ http://example.com/$1? [R=301,L]

AndyA

11:33 am on Jun 25, 2010 (gmt 0)

10+ Year Member



Jim:

Does that revised code need to have NC (no case) somewhere?

Also, why would you recommend being more specific? Isn't it better to have a "wider net"? I'm not challenging you, as you've been very helpful to me in the past, but I'd like to know as this Apache coding just confuses me.

Thanks again.

jdMorgan

4:23 pm on Jun 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If it needed [NC] I would have included it... :)

You may be inquiring about the "[A-Z]+" subpattern in the RewriteCond. This subpattern matches the HTTP Method requested by the client and is always uppercase. If it is not, then the server will have already rejected the request with a 400-Bad Request response before mod_rewrite can even be invoked. So there is no need to waste bytes including or processing an [NC] flag here.

'Specific' as I used it above refers to "what matches what and where."

The RewriteCond pattern is improved to be very specific about the format of a valid request, so that it does not, for example, waste time looking for a question mark before matching the required HTTP method and leading slash. The RewriteRule pattern has been anchored at the suggestion of another member here who has stated that because of the way the regular-expressions matching engine is coded, it will be processed faster that way.

This stuff can indeed be confusing. In addition to learning the mod_rewrite directives, one must also learn regular expressions. But it doesn't end there... You also have to take that knowledge, apply it to what you know about the HTTP protocol and server variable values, and figure out what effect any given rule will have on your site's function, its URL-to-filename mappings, and the resultant 'view' that search engines take of your site to index and rank it... One little typo or omission, small logic error, or rule out of order can have big (and sometimes disastrous) effects. It's a wider-reaching subject area than it may first appear.

To stay out of trouble, always bear in mind that this is server configuration code, and that 'Good enough' often isn't... Those who do only a cursory test and think they're done are frequently the ones who come back here later looking both to correct the earlier defective code and to find ways to repair the damage already done to search rankings by the often-quite-subtle flaws in that earlier code.

Jim

AndyA

1:28 pm on Jun 26, 2010 (gmt 0)

10+ Year Member



Jim:

Thanks for your help, I'll update the code to make it more specific to avoid any unintentional consequences. That's why I try to avoid altering server configuration, as I don't understand what I'm doing and what at first seems like a simple fix often isn't. Of course, it would help if Google would figure out that links on another domain that are coded in this manner with a question mark AREN'T ON MY SITE! And therefore, aren't duplicate page titles!

Seems obvious enough to me, but Google doesn't seem to get it.

g1smd

6:36 pm on Jun 26, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If your server returns a "200 OK" HTTP status code and some content for some URL request, then your server is saying that the URL is OK.

If the URL is one that should not be OK, then you have to configure the server so that those requests are served with a status code that says that the request is not OK.

Whether that's 410, 404, 403, 401, or 301 depends on the exact situation.