Forum Moderators: phranque

Message Too Old, No Replies

Complex Regex Question

Automatic linking in CMS

         

killroy

10:06 pm on Oct 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a little CMS I wrote myself to simplyfy certain things, like crosslinking pages and the like.

I have a list of page titles which include words and other charcters. I want to turn all in a certain list into links, EXCEPT the title that's that current page, that I simply want to highlight.

So I have each titel in the form:
word1.word2.word3.word4

The dots can stand for anything.

I have my lsit of titles like this:

title1¦title2¦title3

and so on in best regex fashion.

Lets call that TITLELIST.

My first version was simply

(TITLELIST) -> <a href="/$1">$1</a>

with a bit before and after to exclude certain tags and so on.

then I added:

(?!CURRENTTITLE)(TITLELIST) -> <a href="/$1">$1</a>

to exclude the current title. This also helps to NOT highlight Titles like "word1.word2" when another document has the titles "word1".

The problem I have is the case where the current document is "word1.word2.word3" and

killroy

11:48 pm on Oct 15, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bumpe-dee-bump... shouldn't have posted late at night....

claus

12:08 am on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, i don't understand the question - could you elaborate a little bit?

Not sure i know the answer, but i'll need to know the question first...or perhaps i'm just too tired myself ;)

MonkeeSage

12:18 am on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm confused (but that is nothing new)...can you post a bit of the actual code?

Jordan

[edit:] echo ... echoecho [/edit]

killroy

9:35 am on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, I got a CMS, i.e. I have unformatted text that I run through a few regexes to produce the finished html.

I have a few pages, each page has a title like this:

1. Title1
2. Title2
3. Title3a-Title3b
4. Title4-Title2

As you can see each title is made up of one and more words, separated by spaces or dashes. Some of these words can be used in more then one title of course. These titles are often descriptive phrases.

No, I want to run a regex over the document, which will take all titles and put tags around them, lets say a <strong> tag.

I would use this:

replace
"(Title1¦Title2¦Title3a-Title3b¦Title4-Title2)"
with
"<strong>$1</strong>"

Works well enough.

now, I want to use <strong> ONLY for the titles of the other pages, not the one htat I'm converting. The title of the current page I want to enclose in "<em>" tags.
So I use this:

Current Page: Title2

replace
"(?!Title2)(Title1¦Title2¦Title3a-Title3b¦Title4-Title2)"
with
"<strong>$1</strong>"

and then after:

"(Title2)"
with
"<em>$1</em>"

Works also well enough. The problems start when the current title INCLUDES another title as part of it.

so if the current page is "Title4-Title2" we get:

replace
"(?!Title4-Title2)(Title1¦Title2¦Title3a-Title3b¦Title4-Title2)"
with
"<strong>$1</strong>"

Except now, an occurrance of
"Title4-Title2"
will be turned into
"Title4-<strong>Title2</strong>"

Which is not good.

In fact in my case it's worse because I'm not jsut makign them <strong> but actually turning them into links, so I get a link in a subpart of phrases.

So lets say I have a page on "Suede Shoe" and another one on "Blue Suede Shoe Cleaner"

then on a the page about the cleaner the sentence:
Our Blue Suede Shoe Cleaner is the best.

INSTEAD of gettign the desired:
Our <em>Blue Suede Shoe Cleaner</em> is the best.

I actually get:
Our Blue <strong>Suede Shoe</strong> Cleaner is the best.

Which is not what I wanted...

is that more clear?

Thanks.

SN

killroy

4:50 pm on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oh come on guys, this must be a common problem with CSVs!
Can't somebody help me out?

SN

MonkeeSage

10:19 pm on Oct 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mabye try...

^(Title1ŠTitle2ŠTitle3a-Title3bŠTitle4-Title2)$

and

^(Title2)$

Jordan

claus

12:56 am on Oct 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



^(Title1Š[^Title4-]Title2ŠTitle3a-Title3bŠTitle4-Title2)$

- or:

^(Title1ŠTitle3a-Title3bŠTitle4-Title2)Š[^Title4-]Title2$

...might do the trick: [^this] translates to "don't match this".