Forum Moderators: phranque

Message Too Old, No Replies

drop the query string entirely from urls

this one seems complicated to me

         

maxed_out

10:29 pm on Aug 17, 2014 (gmt 0)

10+ Year Member



Hello Guys/Gals,

Ok so I got this web site I am moving. it has 18k products that all have a query string on the end. Also no canonical URLs were used.. so fun, right?

My URLs look like:

domain.com/SOME-PRODUCT-URL?sc=99&category=9999999


the 9s are for random categories, they change. there may also be more/less than 2 digits in the first one and more/less than 7 digits in the second.

so to make this work, all i need is the URL before the query string (drop everything from the '?' on wards).

to make this:

domain.com/SOME-PRODUCT-URL


I can't figure this out for the life of me.

Thanks for your time.

penders

11:19 pm on Aug 17, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm curious as to what you tried, but anyway.... using mod_rewrite in .htaccess:

RewriteEngine On
RewriteCond %{QUERY_STRING} ^sc=\d{1,2}&category=\d{1,7}$
RewriteRule (.*) http://domain.com/$1? [R=301,L]


The pattern in the RewriteRule does not match the query string and the
?
at the end of the substitution removes the original query string (preventing it from being appended).

As you can probably tell in the RewriteCond directive, this matches 1 or 2 digits in the "sc" parameter value and 1 to 7 digits in the "category" parameter value.

If there is some pattern to the SOME-PRODUCT-URL then this should be included in the RewriteRule to avoid every request being processed.

maxed_out

11:25 pm on Aug 17, 2014 (gmt 0)

10+ Year Member



Hello penders,

Unfortunately there is no pattern. and it is only the 18k products that have this query string (some idiot used it to make the breadcrumb on the site). no other urls have these query string, there are about 400k categories too, also named in all kinds of ways.

so this rewrite condition will run against all urls or just ones with query strings? i ask because i would like to know if there is going to be a performance penalty on the site. the site gets about 6k visitors a day.

lucy24

7:00 am on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you saying that there are some URLs where you need to keep the query string? Otherwise all you need to do is-- paraphrasing, because blahblah don't do it for you blahblah boilerplate blahblah Forums rule blahblah--

RewriteCond {a query string exists}
RewriteRule {capture path part of requested page URL and 301 redirect to same without query string}


Or are you saying that the URLs with query string only occur in certain directories? If so, include that part in the body of the rule, like

RewriteRule ^((thisdir|otherdir|thirddir)blahblah.php) etcetera


No matter what you do, express the body of the rule so it only matches requests for pages (details will depend on what your URLs look like). The rule will never apply to non-page requests such as images or stylesheets, so make sure the server doesn't have to waste time evaluating conditions.

mod_rewrite goes by "Two steps forward, one back": the conditions (such as the existence of a query string) are ONLY evaluated if the body of the rule potentially matches.

maxed_out

7:09 am on Aug 18, 2014 (gmt 0)

10+ Year Member



Hello lucy27,

The URLs only occur as in the example, no URLs need to have the query string.

Unfortunately those URLs are pages, not directories. and there are 18k different ones with no pattern to be made.

incrediBILL

7:43 am on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Been there, done this a few times, hopefully I've covered all the possible issues below.

The URLs only occur as in the example, no URLs need to have the query string.


domain.com/SOME-PRODUCT-URL?sc=99&category=9999999


If that's your current URL, you need to use that query string (sc and category) to pass to your ecommerce software.

If you want the query string dropped, you must build a database of URLs that translate to your current URLs.

The simpler method using the existing URLs would be to make them:
domain.com/SOME-PRODUCT-URL/99/9999999/
which is like Amazon URLs, that internally translated to this for the software:
domain.com/SOME-PRODUCT-URL?sc=99&category=9999999

If you insist of removing all parameters across the board, then simply build a complete sitemap of your existing software and use that sitemap as a mapping file from old path to new path. This could be accomplished with a simple script.

I would do it all in a SQL file and process the path using PHP or whatever your site is written in and route all requests to the index page and re-route from there.

You really don't want to do this in apache as you will need a rewrite condition and rewrite rule for every single product and category.

Do you really want to make 18K RewriteRule for products and 400k for categories in your .htaccess file?

It's really best handled in the script, not .haccess unless you use a RewriteMap database which requires SSH access to install, it can't be done in .htaccess and hardly anyone knows how to use it, I'm one of those few and I don't recommend it whatsoever.

REVERSE THE PATH IN YOUR PRODUCT PAGE?

I'll bet your product database algo being used to create the product URL can be reverse engineered and a simple search for that product in the database will get all the information you need in your product page.

Since you're going to redo your URLs, why not just make them all:
http://www.example.com/p/SOME-PRODUCT-URL
Then this becomes much simpler, you just redirect all pages with /P/ to your product page, then you do a database lookup for SOME-PRODUCT-URL and display the proper page.

Likewise, for categories do:
http://www.example.com/d/SOME-CATEGORY-URL

In the interim, you'll want to make these changes, leave the old URLs working, and add a canonical meta to all your product and category pages specifying the new URL format, which needs to work first, and then Google will transition all the previous URLs to the new URLs.

FYI, you do know that google thinks each page with parameters is a different page so if it currently works with parameters it won't rank without parameters as they're not the same page. This will require a 301 redirect and a meta canonical to fix and it will take a lot of time, weeks, maybe months for the volume you have.

Also, if you have random categories per product and you aren't currently using the meta canonical per product then each category/product combination is a unique page and would create complete chaos in Google. Not 18K products, possibly 400K products at a minmum, or a lot more!

If you haven't done canonicals yet, just doing that will improve your ranking as it merges all those random pages together.

This will not be pleasant but it's like a band-aid over a scab, you just gotta rip it off and let nature take it's course.

Hope this helps and I'd recommend seeking PHP and Google SEO help as well since Apache is hardly the issue with your site.

maxed_out

8:14 am on Aug 18, 2014 (gmt 0)

10+ Year Member



'Bill, you are right, that i need time to make it work properly. unfortunately, that and money are 2 things this project is lacking.

I have to do the 301 redirects, this change happens in 3 days, i wish it was 6 months, but people dragged their butts... now i am in the middle of this.

the old cart used the query string, magento does not. unfortunately i am lacking in the programming dept too.

Thank you for your advice, every one.

lucy24

3:30 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now, wait, backtrack. Your intial post implied that all you need to do is remove the query string, leaving the rest of the URL intact. Are you now saying you need to capture the query string and incorporate it into the path somehow?

Removing a query is trivial. Capturing-and-reusing can range from straightforward to unspeakable, as detailed by incrediBill above.

domain.com/SOME-PRODUCT-URL


If your page URLs are extensionless and never contain literal periods, express the "pattern" side of the rule as

^([^.]+)$

That will immediately constrain the rule to page requests. And then the Condition checks for the presence of a query string. You don't need to capture the query or consider its exact content, but you have to make sure it exists, or there will be an infinite loop.

incrediBILL

4:36 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, I think I misread it last night.

It appears they're moving from old software to new software so my original post, just thinking they were switching to SEO friendly URLs only, was way off base.

This should be a really straightforward script in Apache that if the query string exists, simply redirect without the query string.

I'd make sure that "sc" or "catgegory" was in the the query string so nothing else gets mucked up.

Hope they aren't off building a massive RewriteMap today before checking back to this forum LOL

maxed_out

7:25 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



Hello guys,

No, I needed to get some sleep. I just read both your posts.

I thought I was really clear in my original post, i try to be. So sorry for causing confusion. I know how it is going back and forth with someone who has no idea what they want.

Google never indexed the products at all, I do not know why, but never has. Only the SEO text on the page and that is all.

During the move from NetSuite to Magento, I took this opportunity to make some changes, IE. everything gets SEO friendly URLs.

The only reason I need to save those product URLs is because they have been linked to directly, and someone put in this query string by javascript.

So, to reiterate, I only need to drop the query string from the URLs that have it, I like the idea of targeting 'sc' & 'category', but I do not know how to do it.

I also want to do it with as little impact, as possible, to the site's performance. I know kinda sounds like I want my cake, your cake and the neighbour's too...

We get 6k visitors a day to this site, so some changes will make it slower, but if i have no other option, well i guess that i would have to do it.

lucy24

8:09 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are there ever, at all, anywhere, any query strings that you need to preserve? If not, the Condition is simply

RewriteCond %{QUERY_STRING} .


(don't overlook the final dot!) meaning "there exists a query string", where unanchored . means "any character". The redirect then simply captures the entire request and strips off the query, as penders said near the beginning of the thread.

maxed_out

8:12 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



Thank you lucy24.

You guys are much more helpful than stack overflow.


Also a big thank you to everyone else, no one person's contribution was overlooked or not given consideration.

Now, off to see if you guys have a magento forum here...

Thanks again

maxed_out

8:23 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



@lucy24,

With testing it, the query string stays visible in the address bar, so people will still be able to copy and paste it.

Is there anyway to drop it off and not be visible (without js preferably)?

I know, a bit of a pain in the but, I am. sorry about that. Just trying to think further ahead, say next year, some spring cleaning.

penders

8:48 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Paste your code that you are testing... if the query string is still visible then either you are not redirecting, or you are inadvertently copying the query string into the substitution.

...everything gets SEO friendly URLs.


Possibly just a minor terminology quibble, but these are really "user" friendly URLs. They have little SEO benefit, other than perhaps to help click through rates from the SERPs (because they look more inviting for users, being more recognisable), but they won't necessarily help you to rank any higher (although possibly less chance for URLs to break).

maxed_out

9:16 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



Hello again penders,

I am using what lucy24 wrote:

RewriteCond %{QUERY_STRING} .

I will explain a little bit about what happened here...

NetSuite is horrible at ecommerce. they have antiquated notions.

So in addition to paying them too much, they have no idea why our conversion rates went through the floor.

We started looking at it more closely and noticed that there were so many bad things happening at once. While we are ranked in the top 3 for every keyword we have targeted, we do not convert them. our cart abandonment went up, people say they are having problems with checkout.

Now, in an effort to save some of that 7k/q that we spend on NetSuite, we are moving to Magento. Our URLs will make sense... see example of supposed SEO URLs from NetSuite:

Category:
ourdomain.com/1975-Pontiac-Grand-Prix

Product in that category:
ourdomain.com/1975-Pontiac-Grand-Prix-Repair-Manual?sc=99&category=999999

Canonical:
NONE - DUPES GALORE

Our new structure is better:

Category
ourdomain.com/search-by-model/pontiac/1975/grand-prix

Product
ourdomain.com/search-by-model/pontiac/1975/grand-prix/1975-pontiac-grand-prix-repair-manual

Canonical:
ourdomain.com/1975-pontiac-grand-prix-repair-manual



Now that is such a nice example of the better URLs that are possible, but there are more with _99 and what not on the end. those have been sorted with redirects in the old directory structure, so hopefully over time, the performance hit of apache will go down.

There is so much that is wrong with that software, but we are finally running away.

Does this help clear it up? or have I made it more confusing?

penders

9:42 pm on Aug 18, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am using what lucy24 wrote:
RewriteCond %{QUERY_STRING} .


Errm, but that's just part of it, where's the rest?

Product in that category:
ourdomain.com/1975-Pontiac-Grand-Prix-Repair-Manual?sc=99&category=999999


Silly question, (and this may have been covered above), but if you simply remove the query string from this URL, then the URL is still valid - yes?

but there are more with _99 and what not on the end. those have been sorted with redirects in the old directory structure


You already have other redirects in .htaccess? mod_rewrite or mod_alias? If your .htaccess isn't too large, then it may be best to post this in its entirety?

maxed_out

9:47 pm on Aug 18, 2014 (gmt 0)

10+ Year Member



My htaccess having been sorted to another location and only affecting traffic to the old URLs, i am ok with it's size at 6k+ lines.

Yes, the URL will be perfectly valid, and is so on the current version of the site.

Check you msgs, pm sent.