Forum Moderators: phranque

Message Too Old, No Replies

Redirect extension

         

darkyl

8:56 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



We've had a short problems on one site running a cms and for a while all pages got their urls transformed like this:

from:
www.example.com/page.htm
to:
www.example.com/page.htm?Itemid=0

Unluckily Google has indexed those pages and now we are trying to redirect the extensions from .htm?Itemid=0 to .htm

This is what we're trying:

RewriteRule (.*).htm?Itemid=0$ /$1.htm [R=301,L]

but nothing happnes, redirect not happening.

We've tried several other attempts without success.
Any idea?

darkyl

9:06 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



Also tried this:
RewriteRule (.*)\.htm?itemid=0$ http://www.example.com/$1.htm [R=301,L]

In both attempts we get no errors, but nothing happens.

g1smd

9:18 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need to test the
%{QUERY_STRING}
part with a separate
RewriteCond
as the RewriteRule cannot see the query string.

This stuff comes up almost every day. There are several open threads in the last few days with the code you need, as well as examples linked to from the sticky thread at the top of the page. This redirect is only one of several that may be needed to fix all the canonical issues you may encounter.

.

You need to be very clear which server paths are valid, and which external URLs you want your users to be able your content through. You'll use rewrites to connect URLs to server paths, and redirects from non-canonical URLs to the canonical formats to ensure that Google only "sees" one URL for each "page" of content, and that all other alternative URLs issue a redirect.

darkyl

9:34 pm on Dec 21, 2008 (gmt 0)

10+ Year Member



Thanks for your input g1.

Following your advice i came up with:

RewriteCond %{QUERY_STRING} Itemid=
RewriteRule (.*) http://www.example.com/$1? [R=301]

but that doesn't seems to do anything.

I'll keep posting my attempts, thx anyway.

g1smd

10:04 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That should work.

Do add the [L] flag though.

Flush your browser cache before testing.

jdMorgan

10:21 pm on Dec 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Be aware that if the CMS package includes rewrites from the 'friendly' URLs to the query-string filepaths, you'll need to check that the requests for URLs with query-strings are coming directly from the client, and not resulting from that internal rewrite. This requires that you test THE_REQUEST instead of QUERY_STRING:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /page\.htm\?Itemid=[^&\ ]*\ HTTP/

Jim

darkyl

12:00 am on Dec 22, 2008 (gmt 0)

10+ Year Member



Thanks both for your help.

I tried testing using THE_REQUEST but nothing happens.

I think what jdMorgan said is true: the requests for urls with ?itemid=0 probably come from another internal rewrite, which I can't locate right now.

Any advice on how to proceed?

darkyl

12:24 am on Dec 22, 2008 (gmt 0)

10+ Year Member



-EDIT-

After having rewrote the above redirect adding the L flag it now works correctly, all redirects now work.

Thanks a lot for your help.

darkyl

10:49 am on Dec 24, 2008 (gmt 0)

10+ Year Member



Hello again.

As I said in my previous post, the redirects seem to work just fine.

However, google webmaster tools is listing the pages with redirects in the "not followed" list, and the error is "empty redirect".

So somehow, while the redirect works via browsers, google can't follow the redirects correctly, and I think it will not drop those pages from the index.

Any idea on why this is happening?

g1smd

1:00 pm on Dec 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use the Live HTTP Headers extension for Firefox to see exactly what is being returned in the HTTP Header.

jdMorgan

1:39 pm on Dec 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In case it wasn't clear, here is an explicit example of what I meant by preventing internal rewrites from being redirected:

# Strip Itemid query strings from all client-requested URL-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[^?]*\?Itemid=[^&\ ]*\ HTTP/
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

This redirect will be invoked for all URL-paths, but only if there is a query string consisting only of "Itemid=<anything>" appended to the URL-path, and only if that URL+querystring was directly requested by an HTTP client.

URLs which match the query string description after being internally rewritten by other code will not be redirected, because those other internal rewrites cannot change the value of "THE_REQUEST" as received from the HTTP client. This prevents an 'infinite' rewrite-redirect loop.

Jim

darkyl

4:03 pm on Dec 24, 2008 (gmt 0)

10+ Year Member



Thanks again for your help jd.

Now I understand what you were saying...

I modified it again and it works, hopefully google should be able to follow the redirect now.

thx again

jdMorgan

4:16 pm on Dec 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't leave that up to "hopefully" or chance: Install Firefox, Live HTTP Headers, and either the User-agent Switcher, Webmaster toolbar, or Prefbar add-ons, and then test your site's redirects using a valid Googlebot User-agent string. If you don't see a single 301 redirect pointing straight to the proper URL, then you still have more work to do.

Jim

darkyl

4:45 pm on Dec 24, 2008 (gmt 0)

10+ Year Member



As you suggested, I installed HTTP headers and user-agent swticher and imported googlebot user-agent string 2.1 new version.

Testing using HTTP headers, after the GET for:
www.example.com/page?Itemid=0

I see an HTTP/1.x 301 Moved Permanently and the location is right, so everything seems to be working.

Thanks again, the things I've learned are precious for me.