Forum Moderators: phranque

Message Too Old, No Replies

htaccess Query String redirect / shorten URL

problem getting it to work :)

         

kidcobra

4:17 pm on Apr 8, 2010 (gmt 0)

10+ Year Member



I am using recordset paging, and am trying to fix the URL's of the pages that are created.

Currently, they appear like this:

http://example.com/sections/auctionitems/example_past_auction_items.php?pageNum_rsaucites=31&totalRows_rsaucites=1550

Each time a new row is added to the dbase, the URL of every page changes as the row number is changed at the end of the URL. Turns out the URL will work without this last part: &totalRows_rsaucites=1550 just fine (basically everything after the 31 in the example, which is in effect the page number). So my plan was to redirect to that URL (to itself basically) and strip of the balance of the query string and in effect create static URL's.

I have tried it many different ways, and here is the frustrating current status of my efforts, which of course does not work. Any guidance would be greatly appreciated:

Options +FollowSymLinks
RewriteEngine on
RewriteCond %{QUERY_STRING} ^pageNum_rsaucites=(.*)&totalRows_rsaucites=1550
RewriteRule ^index\.php/sections/auctionitems/example_past_auction_items\.php$ http://example.com/sections/auctionitems/example_past_auction_items.php?pageNum_rsaucites=%1? [R=301,L]

jdMorgan

4:50 pm on Apr 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You cannot "fix" this problem with mod_rewrite, because mod_rewrite is only invoked *after* the bad link has been clicked or has been indexed by search engines.

In the former case, you invoke a client redirect, which terminates the current HTTP request and causes the client browser to make a second HTTP request to your server, using the URL provided in the preceding redirect response. In this case, both requests get logged, your access log filesize doubles, and your 'stats' become bloated/polluted (and possibly useless) because every 'page' is fetched twice.

In the latter case, the search engines don't think much of a site that publishes incorrect URLs and then forces a redirect every time one is requested. It is an indicator of "low quality" because they know all about the problems in the preceding paragraph. Google, for example, has announced that they will now consider page-load time as a direct ranking factor, and you are essentially doubling yours...

The correct place to fix the problem is at its root; Either fix the code (I assume it's a plug-in for a blog or forum) or replace it with something that generates correct links.

After doing so, you can use redirects to speed up the 'clean-up' of previously-indexed "bad" URLs from search listings if you wish, but the on-page links must be corrected first.

Jim

g1smd

4:58 pm on Apr 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You need a redirect which redirects old URL requests to new URL. It needs to do this without reference to the value of total rows in the original URL request.

RewriteEngine On
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /sections/auctionitems/example_past_auction_items.php\?pageNum_rsaucites=([^&]+)(&[^&\ ]+)+\ HTTP/
RewriteRule ^sections/auctionitems/example_past_auction_items\.php$ http://example.com/sections/auctionitems/example_past_auction_items.php?pageNum_rsaucites=%1? [R=301,L]


This strips all parameters after
pageNum_rsaucites=<value>


You do this along with an edit to the script to stop it publishing unwanted parameters in the links on your pages.


At this point you should also be designing out all the unnecessary items that currently appear in the URL.

Do you need
/sections/
/auctionitems/
/example_past_auction_items
.php
pageNum_
rsaucites

at all?

I think not.

That URL could be a LOT shorter and without parameters, even as short as
example.com/past/123
or somesuch.

[edited by: g1smd at 5:14 pm (utc) on Apr 8, 2010]

kidcobra

5:06 pm on Apr 8, 2010 (gmt 0)

10+ Year Member



HI Jim. Thanks for the fast reply. The code is Dreamweaver recordset paging code which I used without knowing the issues it would cause (how do you build a website... buy dreamweaver, find the on button to the computer, open a book, go to page one, and get the answer to the first question... Where do you get a blank web page so you can figure out what to do with it? :).

I tried to fix the code first. but could not get it to work if I took out the rows part since it apparently needs that to figure out about the number of pages based on how many records you want per page. But once the pages are in place, then they do work without it, though I understand your point. Of course, not knowing enough at the time to consider most of what is in your reply, I thought a 301 would take care of the seo/search thing.

Anyway, thanks for your thoughts and I will head back to square one, wiser for the effort. Greg

kidcobra

5:54 pm on Apr 8, 2010 (gmt 0)

10+ Year Member



H g1. Thanks for the answer. On the easy part, you are of course totally right ... I set up a ridiculous site structure going in, not knowing about anything. Making it worse, some of the folders and files have a couple cap letters.... I know. Anyway, I have been slowly making changes, but our search rankings are good enough that I have not wanted to make wholesale changes all at once. I need to get everything closer to root and your advice in that regard is extremely well taken. About the redirect, I tried it, and it 500 errors any requests for anything in the auctionitems folder where I have it. Looking at your code, I am guessing it should be added to my existing root htaccess file, but I didn't want to shut the entire site down on a guess so I figured I'd ask first. Please let me know, and thanks again for your help. I appreciate it. Greg

kidcobra

9:50 pm on Apr 8, 2010 (gmt 0)

10+ Year Member



Hi g1smd. The code works with one minor adjustment. The ? on the end of the new (substitute) short URL, which I think is there to stop the writing of the rest of the original string where we want it stopped, was causing a $3f to be appended at the end of the substitute URL in the address bar. I found that the 3F is the ASCII symbol for a ?, so I tried it without that ? on a chance it didn't need to be there, and all three extra characters disappeared from the end of substitute URLs, so it's exactly as hoped. And I tested it in web-sniffer and it shows a good 301 to the substitute URL.

I know I have to deal with the folder structure and figure out now how to get the dreamweaver paging code to output the URL I'm redirecting to, but for the following reason, I think this redirect is better than leaving the entire situation as is - And that is because every time we increment the database, the (old) URL changes, but the page title and meta and contents (except the last page) are the same, it was causing some search engine issues of dupe content and for example, pages that were indexed up to 10 times with different ULR's but the same title, meta, and contents, as the search engines did not keep up with the changes as fast as the URLs were changing! This was the original cause of my consternation after leaving it alone for the year and a half it has been like that.

And JIm, you are right, it's a load time and quality issue now with the redirect ( I went to apache and read up about the duplicate page requests after getting your comment) until I can figure out the paging URL output thing to bring it into line. But imagine how low quality it was before the redirect! At least now we have fixed URL's and within a couple of weeks at the latest, each page will only be indexed once (the current average is over 6 per page title indexed). And instead of G, Y, and B crawling these pages constantly to keep up (in vain) with the changing URL's, our search strength will be spent on them crawling relevant parts of the site instead of this mayhem.

Many thanks and my best regards to both of you. I appreciate your help and advice and I have a full plate to deal with as result! Greg

g1smd

12:25 am on Apr 9, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The trailing question mark was a typo.

It's also in your original code, and I didn't spot it.

If I had spotted it, I would have removed it. Do remove it! :)

kidcobra

1:45 am on Apr 9, 2010 (gmt 0)

10+ Year Member



" The trailing question mark was a typo. It's also in your original code ..... "

Heck, my entire block of code was one big typo! :)

Seriously, thanks again for the help.

Greg