Forum Moderators: coopster

Message Too Old, No Replies

Definite and indefinite articles --the, a, an in beginning of titles

         

Ann_G

9:09 pm on Mar 27, 2006 (gmt 0)

10+ Year Member



Is there a way to input the definite and indefinite articles -- the, a, an -- into MYSQL using PHP so that when I retrieve a title from my table, I can sort it alphabetically based on the first main word. For example, when the title is The ABC Story, I want to be able to input it as such but when I retrieve it, I want it to come up under A, and not as currently, under T.

Thankful for any help.

coopster

6:15 pm on Mar 28, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Hmmm. I suppose you could create a second column in the table that might store titles just for this purpose, for example:

+------------------+--------------------+
¦ originalTitle ¦ sortingTitle ¦
+------------------+--------------------+
¦ The First Book ¦ First Book (The) ¦
¦ A Second Book ¦ Second Book (A) ¦
¦ Third Book ¦ Third Book ¦
¦ An Optional Book ¦ Optional Book (An) ¦
+------------------+--------------------+

Or, if you wanted to do it in a query there are a number of ways to approach this, that's for sure. One might be to ORDER BY with TRIM:
SELECT 
title
FROM mytable
ORDER BY TRIM(LEADING 'the ' FROM LOWER(title))
;

You could probably UNION these together to accommodate your "A" and "An" articles.

JollyK

7:10 pm on Mar 28, 2006 (gmt 0)

10+ Year Member



Thanks to a post by a killroy waaaaaay back here (http://www.webmasterworld.com/forum88/5172.htm), I came up with this:

select IF(title REGEXP 'A +¦An +¦The +',TRIM(LEADING SUBSTRING_INDEX(title,' ',1) from title),title) as mytitle,title from test order by mytitle;

Here's what that gives you. The actual titles are on the right here, the one with the A,An,The stripped on the left. You would display "title", as "mytitle" is just used to order things.


+-------------------------+----------------------------+
¦ mytitle ¦ title ¦
+-------------------------+----------------------------+
¦ Big Cat ¦ The Big Cat ¦
¦ Earthly Story ¦ An Earthly Story ¦
¦ Greatest Show on Earth ¦ The Greatest Show on Earth ¦
¦ sly dog ¦ A sly dog ¦
¦ That Darn Cat ¦ That Darn Cat ¦
+-------------------------+----------------------------+

In this case, the original data looks like this:


+----------------------------+
¦ title ¦
+----------------------------+
¦ A sly dog ¦
¦ The Big Cat ¦
¦ An Earthly Story ¦
¦ The Greatest Show on Earth ¦
¦ That Darn Cat ¦
+----------------------------+

You can google for mysql regexp to find how to tweak this to look for a/an/the case-insensitively. (Thanks, killroy, btw.)

JK

coopster

8:02 pm on Mar 28, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



There you go, that's another working option, well, almost. I was going to mention a regex too, but thought perhaps a UNION might be faster. Let me explain the 'almost' here ...

The only issue you are going to have with that is you will need to trim that TRIM once more to get rid of the leading space. Add one more entry to your table such as 'Darn Cat' and you will see what I mean. The blank spaces are being left on the front end of that title.

JollyK

8:47 pm on Mar 28, 2006 (gmt 0)

10+ Year Member



Sorry, coopster, you're absolutely right. This should work:

select TRIM(LEADING ' ' from (IF(title REGEXP 'A +¦An +¦The +',TRIM(LEADING SUBSTRING_INDEX(title,' ',1) from title),title))) as mytitle,title from MyTable order by mytitle;

You end up with


+------------------------+----------------------------+
¦ mytitle ¦ title ¦
+------------------------+----------------------------+
¦ Big Cat ¦ The Big Cat ¦
¦ Big Cat ¦ Big Cat ¦
¦ Earthly Story ¦ An Earthly Story ¦
¦ Greatest Show on Earth ¦ The Greatest Show on Earth ¦
¦ sly dog ¦ A sly dog ¦
¦ That Darn Cat ¦ That Darn Cat ¦
+------------------------+----------------------------+

The only problem with that is that whichever title was entered first will show up first. (Notice how "The Big Cat" comes before "Big Cat.") There may be a way to use concat to tack the result of the regexp back onto the end, but I haven't really explored that.

Good catch, thanks!

(And GRR! the "pre" tags don't preserve my spacing in that table.)

JK

Ann_G

10:47 pm on Mar 28, 2006 (gmt 0)

10+ Year Member



Thanks for the responses. I haven't had time to try anything yet but I'll let you know how it goes.

JollyK

11:19 pm on Mar 28, 2006 (gmt 0)

10+ Year Member



WHOA!

Forgot to mention: those things in the regexp that look like ¦ are changed by the Webmaster World software into something similar, but not the same, so if you copy/paste that statement, make sure to change them to real ¦ characters. (On my keyboard, it's a "capital backslash.")

JK

Ann_G

11:06 pm on Mar 31, 2006 (gmt 0)

10+ Year Member



Thank you very much!

Sorry it took me so long to respond. But I have now tested this:

select TRIM(LEADING ' ' from (IF(title REGEXP 'A +¦An +¦The +',TRIM(LEADING SUBSTRING_INDEX(title,' ',1) from title),title))) as mytitle,title from MyTable order by mytitle;

and it works perfectly. All my titles are now in alphabetic order according to the first real word.

I don't know what I would do without this forum.

Ann_G

9:19 pm on Apr 2, 2006 (gmt 0)

10+ Year Member



I spoke too soon. After entering additional titles the list started doing strange things (see below).

I can't figure out why when I use this query the word Kafka is placed after N and also the title "Eat Pray Love" ends up in the wrong spot. Everything else is put in alphabetic order as I want it. I have been fiddling around with the query but can't seem to get it right. Any ideas is greatly appreciated?

select TRIM(LEADING ' ' from (IF(title REGEXP 'A +¦An +¦The +',TRIM(LEADING SUBSTRING_INDEX(title,' ',1) from title),title))) as mytitle,title from MyTable order by mytitle;

The Curious Incident of the Dog in the Night-Time
The Kentucky Derby
My life in the middle ages : a survivor's tale
Mountain Beyond Mountain
Never Let Me Go
Kafka on Shore
Kafka On the Shore
Eat Pray Love: one woman's search for everything across Italy
Seven Sisters
Slow Man
Sun and Shadow
A Sunday at the Pool in Kigali
Sunflower

JollyK

2:05 am on Apr 3, 2006 (gmt 0)

10+ Year Member



Hahaha! Okay, yes, it's a weirdie, but it's happening because we didn't specify that the regular expression ONLY be at the beginning of the title. So, "Kafka On The Shore" finds the "The" and takes off the first word of the title (which in this case, isn't "The" but "Kafka," leaving "On The Shore," and since "On" comes after "N..." well, you get the idea. Not sure about "Kafka on Shore" but the one with "The" is obvious.

So, we need to tell it that we only care if the article is at the beginning of the title.

Change this:

REGEXP 'A +¦An +¦The +',

to this:

REGEXP '^A +¦^An +¦^The +',

(Basically, add ^ before A, An, and The. That tells it to only look at the beginning.)

Sorry, common mistake. :-)

JK

Ann_G

2:56 pm on Apr 3, 2006 (gmt 0)

10+ Year Member



The ^ corrected the problem. Thank you for all your help. You really made my life a bit easier.