homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / RSS, ATOM, and Related Technologies
Forum Library, Charter, Moderators: bill & werty

RSS, ATOM, and Related Technologies Forum

    
Regex To Add And Delete Lines
doubleJ




msg:4349154
 7:31 pm on Aug 8, 2011 (gmt 0)

Hello...
Is there a way, using a find/replace program and regular expressions, that I can turn this sample text...

<item>
<title>A Fighting Spirit - Pt. 01</title>
<subtitle>Fight The Good Fight Of Faith</subtitle>
<author>Keith Moore</author>
<itunes:author>Keith Moore</itunes:author>
<enclosure url="http://www.flcmedia.org/product/0302-AFightingSpirit-01-FightTheGoodFightOfFaith.mp3" length="" type="audio/mp3"/>
<pubDate>Tuesday, Sep 08, 1992 07:00</pubDate>
</item>

into this...

<item>
<title>01 - Fight The Good Fight Of Faith</title>
<dc:creator>Keith Moore</dc:creator>
<enclosure url="http://www.flcmedia.org/product/0302-AFightingSpirit-01-FightTheGoodFightOfFaith.mp3" length="23992870" type="audio/mpeg" />
<guid>http://www.flcmedia.org/product/0302-AFightingSpirit-01-FightTheGoodFightOfFaith.mp3</guid>
<pubDate>Tue, 08 Sep, 1992 19:00:00 CST</pubDate>
</item>

I'm not sure if regular expressions are powerful enough to change the date information, but I would assume that I can use the enclosure tag to create the guid (replace <enclosure url="*" length="" type="audio/mp3"/> with <enclosure url="*" length="" type="audio/mp3"/>{}e<guid>*</guid>).
Unfortunately, that isn't working. I've been reading about regex, but I'm just not getting it, for this purpose (replace this and this, but nothing between). Honestly, I haven't been getting a lot of it.
Hehehe...
I would also assume that you can delete any line that start with such and such, but I haven't figure that out, either.
Thanks for the help.
JJ

 

bill




msg:4367645
 6:38 am on Sep 27, 2011 (gmt 0)

Sorry, we don't appear to have many Regex gurus here in our RSS forum. Did you have any luck with this?

lucy24




msg:4367685
 9:05 am on Sep 27, 2011 (gmt 0)

we don't appear to have many Regex gurus here in our RSS forum

Especially when the subject header leads you to expect a question about adjusting the number of \n line breaks, which you can do standing on your head :(

doubleJ




msg:4367798
 2:27 pm on Sep 27, 2011 (gmt 0)


Sorry, we don't appear to have many Regex gurus here in our RSS forum. Did you have any luck with this?

Kind of...
I went to the regex irc channel and they were able to shed some light.
I wasn't able to use the program "replacetext". It does do regex, but it wasn't working with the code that they were giving me.
I don't even remember which program that I ended up using (I tried a bunch of text editors).
I do recall that I had to open all the files up, within the program, I couldn't just select the files and batch it. Once all the files were open, I was able to batch within the open files.
This is what I ended up doing...

Find:

<enclosure url="(.*?)" length="23992870" type="audio/mpeg" />

Replace with:

<enclosure url="\1" length="23992870" type="audio/mpeg" />\r\n\t\t\t<guid>\1</guid>

Duplicates the line and adds a second line with the contents of (.*?)

Find:

([a-zA-Z]{3})(\s*)(\d+)

Replace with:

\3\2\1

Converts <pubDate>Sun, Aug 07, 2011 11:00:00 CST</pubDate> to <pubDate>Sun, 07 Aug, 2011 11:00:00 CST</pubDate>

Find:

<url>(.*?)</url>

Replace with: (nothing)
Deletes <url>whatever</url>

I will say that there was one problem with the used code.
"Marriage Enrichment 2010" ended up being something like "Marriage En2010richment ".
It has something to do with the whitespaces and then numbers. Any letters followed by none or more spaces (or something like that) followed by 1 or more numbers (or something like that) was changed.
I was doing it for the pubDate but it applied to the whole document. I had to go through and manually change the errors as I found them.


Especially when the subject header leads you to expect a question about adjusting the number of \n line breaks, which you can do standing on your head :(

I don't remember typing the word "break" or the code "\n" in the subject, anywhere.
JJ

lucy24




msg:4368040
 12:10 am on Sep 28, 2011 (gmt 0)

No, but you said "add and delete lines". Incidentally, I was intrigued to see \r\n in your quoted code, because I thought SubEthaEdit was the only program that had to do this. It's bilingual, Mac-to-Windows, so it can't simply camouflage \r\n as \n when you're preparing a text that has to have Windows-style line endings.

doubleJ




msg:4395448
 4:26 am on Dec 8, 2011 (gmt 0)

I don't even remember which program that I ended up using (I tried a bunch of text editors).

I just found the program, again.
It's called EditPad Lite 7.
I'm not sure what kind of regex it uses, but it's the only program that I tried that would work with the code posted above.
Also, I figured out how to do the date conversion that I was asking about, without messing other things up.
Find:

{\s{1}}([a-zA-Z]{3})(\s{1})(\d{2})(\s{1})

That will only find...
space letterletterletter space numbernumber space
Replace with:

\5\4\3\2\1

That will replace it as...
space numbernumber space letterletterletter space
There's no chance of "Enrichm2010 ent", since it specifically requires the correct spacing, lettering, and numbering (I can't think of any logical problems, at least).
JJ

Edit...
Actually, that code was used for a slightly different format (no , after the numbernumber).
The principle is the same, though.
JJ

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / RSS, ATOM, and Related Technologies
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved