Forum Moderators: open
I cant seem to figure out the best way to do this...
Here we go..
I have a table full with title, description, etc....
I have a sort on my webpages (a,b,c,d, etc..) when say 'a' is clicked, it generates the page with the sql query:
SELECT * FROM table WHERE title LIKE 'a%'
this works fine for me, HOWEVER, it poses a problem when sorting titles like: 'The Example' and 'The Betterexample'
is there an efficient way to filter out 'The', so that when clicking on e 'The Example' will be in the results and when clicking on t 'The Example' is NOT in the results?
I was thinking of adding another field to the table with (0,1) values so that I know if the title has The in the beginning, but I know this is VERY inefficient. Please let me know of your thoughts, as I am still a amateur when it comes to complex MySQL queries
Thanks,
Alex
SELECT * FROM table WHERE title LIKE 'a%' or title like 'The a%' or title like 'An a%' or title like 'A a%';
or you could use regexps,
SELECT * FROM table WHERE title regexp '^(theŠanŠa)*\s*a.*';
or
SELECT * FROM table WHERE title regexp '^\s*[A-Za-z0-9]*\s*a.*';
(Both of the previous are for example only and may need tweaking, untested.)
^ in that context means "string begins with", \s* means zero or more spaces, [A-Za-z0-9]* means zero or more of any letter or number, .* means zero or more of any character. Zero or more makes the pattern optional, so it will still match on items just beginning with A.
So with the right regexp, it can begin with "anything" - the a, an, I (Am the Eggman), etc., but may give unexpected results. You'd have to play around with it, but it may be better than the first one, which is specifically The, a, an, which would require more maintenance if more beginning articles are discovered.
If it's just "the," and will never change, the first one would probably be best.
A second approach is in your programming, strip off any leading articles from your term on input when you cleanse the variables. Then you can just use "like 'a%'.
Otherwise you have to do text transforms on the query, which means you're building a temporary table every time you run this query.
SELECT *, REPLACE(LOWER(`title`), 'the ', '') AS title_sans_the FROM `widget` ORDER BY title_sans_the
And I have a feeling you'd eventually want to also remove other words like "a," "an," etc. which cause this type of query to get unwieldy very quickly:
SELECT *, REPLACE(REPLACE(REPLACE(LOWER(`title`), 'a ', ''), 'an ', ''), 'the ', '') AS title_sans_the FROM `widget` ORDER BY title_sans_the
Better to just do this string manipulation once into a new column:
UPDATE `widget` SET `title_sort` = REPLACE(REPLACE(REPLACE(LOWER(`title`), 'a ', ''), 'an ', ''), 'the ', '');
Something like this....
SELECT * FROM table WHERE replace(lower(title), 'the', '') LIKE 'a%'
You may have to check how replace works and change the syntax. It may be something like this
SELECT * FROM table WHERE lower(title).replace('the', '') LIKE 'a%'
The first option LIKE 'The a%' looks like it would be the best choice for my data, Im not worried about other articles because they are very rare.
One issue is that, when using this method, selecting sort 'a' would include 'The album' but I also need to exclude 'The album' from the results of sort 't'.
Any ideas?
Thanks again ;)
One issue is that, when using this method, selecting sort 'a' would include 'The album' but I also need to exclude 'The album' from the results of sort 't'.
See if you can do a replace in the WHERE clause, this may replace "the" before it does the eval for matching records.
SELECT * FROM table WHERE replace(lower(title), 'the', '') LIKE 'a%'
I have no problem to add a colm for sort_title, just my db is pretty large as it is, but it wouldnt be hard to select * and update new colm with a sort friendly title.
let me know of your thoughts
alex