homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
SED replace double spaces
wesg




msg:4200713
 5:51 am on Sep 12, 2010 (gmt 0)

I have a .CSV file I need to clean up with bash. The csv file was built using OCR so the table row is all shown as 1 cell separated by xx spaces. I want to replace up to xx spaces with a single comma to start a cell.

I've tried using
sed "s/[\s]{2,}//g"

but that doesn't seem to do the job. I replaced \s with just a " " character but that didn't fly either.

Suggestions?

 

phranque




msg:4200735
 9:40 am on Sep 12, 2010 (gmt 0)

sed uses basic regular expressions which don't include character classes.
in addition, BREs require that the quantifier braces be escaped with backslashes.

wheel




msg:4202980
 8:49 pm on Sep 16, 2010 (gmt 0)

Try this. No need to make a backup - guaranteed to work first time! Heh, or make a backup.

perl -i -p -e 's/\ \ /\ /g'

SteveWh




msg:4203151
 7:46 am on Sep 17, 2010 (gmt 0)

I think one definite error is that you're not replacing the spaces with a comma. That would be:


sed "s/\s{2,}/,/g"


But I believe \s will capture any whitespace (tabs, too), so this would be more specific


sed "s/ {2,}/,/g"


I suspect that should work, but if there is any problem with the {2,} you could even eliminate that with this


sed "s/ +/,/g"


What is after the s/ is TWO spaces and a + sign, meaning 1 space followed by at least one more space.

phranque




msg:4203156
 8:28 am on Sep 17, 2010 (gmt 0)

those first two examples won't work because sed doesn't support BRE.

combining all of the above, try this:
perl -p -i.bak -e 's/\s\s+/,/g;' filename.txt

this will create a backup and then loop through filename.txt, replacing two or more consecutive whitespace characters with a single comma.
i'm guessing \s\s+ is a more efficient regexp than \s{2,} but they both work.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved