homepage Welcome to WebmasterWorld Guest from 54.205.205.47
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
Can Regex replace spaces with dashes?
Meancode




msg:4164260
 9:34 am on Jul 4, 2010 (gmt 0)

The Problem:

I have these KML (hundreds) and in them I need to replace IMG SRC references in them (thousands). I need to replace spaces with dashes. This is what I am given:

<img src="Profile\Akron OH 44303 - NORTHEASTERN EDUCATION (3)_chart.bmp" width="457" height="305" />

I need to turn this into this:

<img src="Profile/Akron-OH-44303---NORTHEASTERN-EDUCATION-(3)_chart.bmp" width="457" height="305" />

the IMG SRC always starts Profile\ and always ends width="457" height="305" />.

I really hope this can be automated in some way. Let me know!

 

Frank_Rizzo




msg:4164267
 10:20 am on Jul 4, 2010 (gmt 0)

If this is for just one or two files this can be done easily with something as simple as Notepad.

Notepad can be used to search for space and replace with - but you need to ensure that only the spaces between the src=" " are changed.

Try this on a test file.

1. Open file in Notepad
2. Click on Edit, Replace
3. In the Find what: box enter Profile\
4. In the Replace with: box enter Profile/
5. Click Replace All

Next we need to protect the normal spaces so they wont get dashed

6. Replace img src WITH img#src
7. Replace " width with "#width (note the one blank space after the ")
8. Replace " height with "#height (note the one blank space after the ")
9. Replace " / with "#/ (note the one blank space after the ")

Now we can replace all other spaces with -

10. Replace with - (note the one blank space in the Replace with: box)

Now put all the other protected spaces back:

11. Replace img#src WITH img src
12. Replace "#width with " width
13. Replace "#height with " height
14. Replace "#/ with " /

If you have dozens and dozens of files it would be easier to write a php or perl script to read the files one by one and perform a regex on qualifying lines.

Meancode




msg:4164281
 11:49 am on Jul 4, 2010 (gmt 0)

It is a lot more complicated than that. It is hundreds of files. And I only need to replace the spaces inside the IMG SRC in a specific place in the KML, in the Placemark tag. These files are sometime 250,000 lines long, with say 1,200 IMG SRC tags I need to replace inside of them.

And, I don;t know how to write such a script :(

birdbrain




msg:4164298
 1:24 pm on Jul 4, 2010 (gmt 0)

Hi there Meancode,

try this script in the head section of your documents...


<script type="text/javascript">

function init(){
i=document.getElementsByTagName('img');
for(c=0;c<i.length;c++) {
i[c].src=i[c].src.replace(/%20/g,'-');
}
}

if(window.addEventListener){
window.addEventListener('load',init,false);
}
else {
if(window.attachEvent){
window.attachEvent('onload',init);
}
}

</script>

birdbrain

SteveWh




msg:4164314
 2:12 pm on Jul 4, 2010 (gmt 0)

To do the replacements in your source code in multiple files, the most flexible tool is most likely the sed program, of which a particularly good version is called ssed (Super Sed). Although originally for linux, there's a very good version of ssed on the web for Windows, too.

If you're new to it, it will probably take many hours to get used to it, do experiments, and set up your search-and-replace. If that gets daunting, think of the many more hours you're saving and the skill you're acquiring at using an extremely versatile and useful program. In addition, the regex skills you learn for ssed will transfer to Perl, should you become interested in that.

If sed/ssed isn't of interest, you didn't say what regex tool(s) you have available. FrontPage (and probably Expression Web), and the Visual C++ Express IDE have regex search-and-replace. Probably Dreamweaver, too, though I've never used it. But for complex regex, nothing comes close to the Perl Compatible Regular Expressions used by ssed, Perl, etc.

RonPK




msg:4164440
 9:49 pm on Jul 4, 2010 (gmt 0)

The required regexp might get pretty complex, with lookaheads and lookbacks, non-greedy searches and other nifty details. I'm no regexp-guru so I'd probably use a callback function to create a two-tier operation:
1. isolate the image tag(s) with a regexp, send it to the callback function
2. in the callback do the replacements and return the results.

If you're familiar with a scripting language like PHP it shouldn't be too hard...

Meancode




msg:4164476
 12:54 am on Jul 5, 2010 (gmt 0)

Hey all,

birdbrain that is a great idea. Sadly this is not for a web page but a KML file, that will be re-archived back into a KMZ. But %20 would work and I even did a test. But I am running the files through a Photoshop action anyways, so I have it output with dashes.

SteveWh, I am somewhat familiar with sed, in that I know what it does and that I can use it on my Mac. But I simply don't have the time to learn how to use it, and I am a beginner when it comes to regex.

I consider this a closed case. I ended up challenging a complete stranger on Twitter who professed his love for regex. He wrote a small perl script and it works great.

Here it is:

#!/usr/bin/perl -w
use strict;
use warnings;

my $filename = $ARGV[0];

{
local $ARGV = ($filename);
# Start "in-place" mode. Remove this to dump output to the console instead.
local $^I = '.bak';

# Define the variables I'll need...
my $str;
my $pre;
my $post;
my $replaced;

while (<>) {
$replaced = 0; # Reset!
while (m/(.*<img )(src="Profile\\.+?")( width="457" height="305" \/>.*)/sg) { # While we have the pattern...
# Remember what came before and after...this allows me to preserve
# newlines and other tags. This IS a kml file.
# Each <img src> probably won't have a line to itself.
$pre = $1;
$post = $3;
$str = $2;
$str =~ s/ /-/g;
print $pre.$str.$post;
$replaced = 1;
}
if ($replaced == 0) { # Nothing got replaced. Output the line untouched.
print;
}
}
}

SteveWh




msg:4164500
 2:42 am on Jul 5, 2010 (gmt 0)

Thank you for posting the script.

To a perl expert, it's probably straightforward, but it was only a couple of weeks ago that I had to learn enough to write my first perl script to do something ssed couldn't do.

It took a half hour or more to write 8 lines, but it was amazing what those lines could do. They did a somewhat complicated transformation that processed a 24MB file into a 48MB file, not something I would have attempted by manual editing in any text editor.

Meancode




msg:4165964
 5:55 pm on Jul 7, 2010 (gmt 0)

Yea it works great. I have modified the regex on line 30 since then, but thats about it.

$str =~ s/[^a-zA-Z0-9-]//g;

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved