homepage Welcome to WebmasterWorld Guest from 54.145.252.85
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
Forum Library, Charter, Moderators: bakedjake

Linux, Unix, and *nix like Operating Systems Forum

    
SED replace double spaces
wesg

5+ Year Member



 
Msg#: 4200711 posted 5:51 am on Sep 12, 2010 (gmt 0)

I have a .CSV file I need to clean up with bash. The csv file was built using OCR so the table row is all shown as 1 cell separated by xx spaces. I want to replace up to xx spaces with a single comma to start a cell.

I've tried using
sed "s/[\s]{2,}//g"

but that doesn't seem to do the job. I replaced \s with just a " " character but that didn't fly either.

Suggestions?

 

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4200711 posted 9:40 am on Sep 12, 2010 (gmt 0)

sed uses basic regular expressions which don't include character classes.
in addition, BREs require that the quantifier braces be escaped with backslashes.

wheel

WebmasterWorld Senior Member wheel us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4200711 posted 8:49 pm on Sep 16, 2010 (gmt 0)

Try this. No need to make a backup - guaranteed to work first time! Heh, or make a backup.

perl -i -p -e 's/\ \ /\ /g'

SteveWh

5+ Year Member



 
Msg#: 4200711 posted 7:46 am on Sep 17, 2010 (gmt 0)

I think one definite error is that you're not replacing the spaces with a comma. That would be:


sed "s/\s{2,}/,/g"


But I believe \s will capture any whitespace (tabs, too), so this would be more specific


sed "s/ {2,}/,/g"


I suspect that should work, but if there is any problem with the {2,} you could even eliminate that with this


sed "s/ +/,/g"


What is after the s/ is TWO spaces and a + sign, meaning 1 space followed by at least one more space.

phranque

WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4200711 posted 8:28 am on Sep 17, 2010 (gmt 0)

those first two examples won't work because sed doesn't support BRE.

combining all of the above, try this:
perl -p -i.bak -e 's/\s\s+/,/g;' filename.txt

this will create a backup and then loop through filename.txt, replacing two or more consecutive whitespace characters with a single comma.
i'm guessing \s\s+ is a more efficient regexp than \s{2,} but they both work.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Linux, Unix, and *nix like Operating Systems
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved