Welcome to WebmasterWorld Guest from 54.166.33.25

Forum Moderators: bakedjake

SED replace double spaces

   
5:51 am on Sep 12, 2010 (gmt 0)

5+ Year Member



I have a .CSV file I need to clean up with bash. The csv file was built using OCR so the table row is all shown as 1 cell separated by xx spaces. I want to replace up to xx spaces with a single comma to start a cell.

I've tried using
sed "s/[\s]{2,}//g"


but that doesn't seem to do the job. I replaced \s with just a " " character but that didn't fly either.

Suggestions?
9:40 am on Sep 12, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



sed uses basic regular expressions which don't include character classes.
in addition, BREs require that the quantifier braces be escaped with backslashes.
8:49 pm on Sep 16, 2010 (gmt 0)

WebmasterWorld Senior Member wheel is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Try this. No need to make a backup - guaranteed to work first time! Heh, or make a backup.

perl -i -p -e 's/\ \ /\ /g'
7:46 am on Sep 17, 2010 (gmt 0)

5+ Year Member



I think one definite error is that you're not replacing the spaces with a comma. That would be:


sed "s/\s{2,}/,/g"


But I believe \s will capture any whitespace (tabs, too), so this would be more specific


sed "s/ {2,}/,/g"


I suspect that should work, but if there is any problem with the {2,} you could even eliminate that with this


sed "s/ +/,/g"


What is after the s/ is TWO spaces and a + sign, meaning 1 space followed by at least one more space.
8:28 am on Sep 17, 2010 (gmt 0)

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



those first two examples won't work because sed doesn't support BRE.

combining all of the above, try this:
perl -p -i.bak -e 's/\s\s+/,/g;' filename.txt

this will create a backup and then loop through filename.txt, replacing two or more consecutive whitespace characters with a single comma.
i'm guessing \s\s+ is a more efficient regexp than \s{2,} but they both work.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month