homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

This 40 message thread spans 2 pages: 40 ( [1] 2 > >     
Need script to separate numbers according to space
The result will be used as a table in PHP/MySQL
StevenHu




msg:3746396
 3:30 pm on Sep 17, 2008 (gmt 0)
I have two columns in my Excel file. Left column is a part number. Right column contains all the categories that part number belongs to (such as decal, car body, tools, etc.) The categories are separated by a space. The list contains over 3000 part numbers.

I need to make a separate line for each category so each row will consist of one part number and one category. This will then be output as a table in PHP/MySQL. The parts will be called from the table according to the category. It will result in a parts lookup on a web site.

Example:

part number categories
21030 3005 3006 3007

Question: What is the Perl script for me to change the above to the following:

21030 3005
21030 3006
21030 3007

I know Perl won't work on Excel files. Should I export it as CSV or Tab separated? I know nothing about Perl, unfortunately, but I do remember that it is good at the above sort of thing, and want to save myself a week or more of effort.

Would PHP be a better tool for this job?

Regards,
Steve

 

Dabrowski




msg:3746433
 4:10 pm on Sep 17, 2008 (gmt 0)

It should be very easy to do this, are you wanting to run it from a command line?

A CSV would be preferable, easy to export from Excel, and import again if you need to.

To make it output though we'll need to know your database structure, and how/where you want it storing.

rocknbil




msg:3746447
 4:21 pm on Sep 17, 2008 (gmt 0)

Welcome aboard StevenHu! PHP wouldn't be a "better tool for the job," but if you don't know perl, PHP might be easier for you to pick up.

I do this stuff all the time - to answer one q, clients will export their data in a delimited format of some sort, I throw together a script that reads it line by line and populates the DB. Perl or PHP can do this equally (as can any other dynamic language.) I prefer to have them export the data with a specific delimiter - not tab or comma, or quote qualified - for example, delimited by the pipe character:

field1¦field2¦field3¦field4


open (FILE, "exported_file.csv");
while (chomp($line = <FILE>)) {
@fields = split (/\¦/,$line);
@cats = split(/\s+/,$fields[2]); #this is your categories
$fields[2] = $cats[0]; # replace $fields[2] with the first one
$select = qq¦insert into db (field1,field2,field2,field4) values¦;
$select .= qq¦('$fields[0]','$fields','$fields[2]','$fields[3]');¦;
$sth = $dbh->prepare("$select;");
$rv = $sth->execute or die("Cannot insert data");
$sth->finish;
}
close(FILE);

[1][edited by: phranque at 7:59 pm (utc) on Sep. 19, 2008]
[edit reason] disabled smileys ;) [/edit]

StevenHu




msg:3746483
 5:02 pm on Sep 17, 2008 (gmt 0)

Does your script result in a table with only two fields? The part number will need to be duplicated for each category. Some part numbers fit only one category, other parts may fit up to 10 differnt categories. Is your script limiting the number of categories to 4? There are over 200 categories these 3000 parts will fit into.

field1: field2:
partno1 > categorya
partno1 > categoryb
partno1 > categoryc
partno2 > categoryc
partno3 > categoryt
partno3 > categoryr
etc.

Thanks!
Steve

Dabrowski




msg:3747040
 12:54 pm on Sep 18, 2008 (gmt 0)

I prefer to have them export the data with a specific delimiter - not tab or comma, or quote qualified

Always wise if the content may possibly contain one of those characters. I think you're safe in this case if you're just using numbers.

Steve to write you an accurate script you really need to tell us how your input file is formatted, and how you want it storing in the database. I'm assuming here that you already have a database of course?

StevenHu




msg:3747142
 2:46 pm on Sep 18, 2008 (gmt 0)

OK, right now it is in Excel format, two columns. The final table will be just two columns, PartNo and CatNo. That is the full table (which does not exist yet). I can export from Excel as CSV and via a program create an SQL table out of it, no problem. But it will result in this:

PartNo ¦ CatNo
21030 ¦ 3005a.#*$!x 3006g.#*$!x z3007.2345

Instead of this:

PartNo ¦ CatNo
21030 ¦ 3005a.#*$!x
21030 ¦ 3006g.#*$!x
21030 ¦ z3007.2345

Each category number will be ten digits in length, containing any mixture of letters and/or numbers and/or a period, separated by a space. Of course, in Excel, I can do a search/replace to replace the space with a pipe.

Thanks,
Steve

Dabrowski




msg:3747202
 3:31 pm on Sep 18, 2008 (gmt 0)

ok, try this:

[perl]my $parts;
foreach( @lines_in_csv) {
my ( $part, $category) = split( /,/);
push( @{ $parts -> {$part}}, $cat);
}

foreach my $partno ( keys( %$parts)) {
my $categories = join( ",", @{ $parts -> {$partno}});

## Now you can do something with $partno and $categories
## $partno is your original part number
## $categories is now a , separated list of all your categories
}
[/perl]

You could easily use rocknbil's code to push this into a db.

StevenHu




msg:3747215
 3:45 pm on Sep 18, 2008 (gmt 0)

I'm puzzled. Why are you creating a comma-separated list of my categories, when they should one category per field?

PartNo ¦ CatNo
21030 ¦ 3005a.#*$!x
21030 ¦ 3006g.#*$!x
21030 ¦ z3007.2345

Thanks,
Steve

Dabrowski




msg:3747283
 5:08 pm on Sep 18, 2008 (gmt 0)

Maybe cos you didn't explain clearly, I got your in and out mixed up.

right now it is in Excel format, two columns. The final table will be just two columns

ok, try this:
[perl]my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> {$part} = split( /\s/, $categories);
}

foreach my $partno ( keys( %$parts)) {
foreach my $category ( @{ $parts -> {$partno}}) {

## Now you can do something with $partno and $category
## $partno is your original part number
## $category will change each time singularly while $partno stays the same
}
}[/perl]

Very similar, I've modified the input loop so it still creates a list of categories, and the output loop so it'll give you each one separately.

Yes, I know this could all be done in one go but I like to simplify or it all gets too brackety! :D

StevenHu




msg:3747308
 5:32 pm on Sep 18, 2008 (gmt 0)

Wow, so simple!

perl_diver




msg:3747436
 8:37 pm on Sep 18, 2008 (gmt 0)

this to me does not look like it will work:



my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> {$part} = split( /\s/, $categories);
}

$parts->{$part} will equal the last value from the file for each $part. It will not be an array of all the categories associated with each part. Dabrowskis code posted earlier will work if the push assignment is corrected, $cat below should be $category:



my $parts;
foreach( @lines_in_csv) {
my ( $part, $category) = split( /,/);
push( @{ $parts -> {$part}}, $cat);
}


StevenHu




msg:3747453
 9:20 pm on Sep 18, 2008 (gmt 0)

OK, here are the steps. Let me know if I missed something.

1. Export the Excel file as "exported_file.csv" in CSV format.
2. Open an HTML page and put the PHP tags on it.
3. Between the PHP tags, put:

my $parts;
foreach( @lines_in_csv) {
my ( $part, $category) = split( /,/);
push( @{ $parts -> {$part}}, $cat);
}

foreach my $partno ( keys( %$parts)) {
foreach my $category ( @{ $parts -> {$partno}}) {

}
}

4. I notice that it makes no mention of the file to open. The above script is all I need to have two columns of figures, one per field? This would save me a ton of time!

Thanks!
Steve

perl_diver




msg:3747487
 10:06 pm on Sep 18, 2008 (gmt 0)

if your document is a php document you should use php. If you don't know how, ask on a php forum.

Dabrowski




msg:3747815
 11:33 am on Sep 19, 2008 (gmt 0)

*perl_diver*, thanks for pointing out my mistake. $parts -> {$part} was actually being set to the number of elements in the returned array. I forgot the extra brackets!

*StevenHu*, this code works, I have tested it, but as a standalone, not as a CGI script. *Perl_diver* is correct, it is not PHP. It cannot run inline with html. Ideally (I think) for your application you'd want an HTML upload form, so you could upload your CSV to the script, then the script would add it to the database.

[perl]use strict;

open( CSV, "part_info.csv") ¦¦ die "Open file failed!\n";
my @lines_in_csv = <CSV>;
close( CSV);

my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> {$part} = [ split( /\s/, $categories)];
}

foreach my $partno ( sort keys( %$parts)) {
foreach my $category ( sort @{ $parts -> {$partno}}) {

## Now you can do something with $partno and $category
## $partno is your original part number
## $category will change each time singularly while $partno stays the same

print "$partno - $category\n";
}
}[/perl]

I've added a bit at the top that reads the CSV file called part_info.csv.

I have added a sorting command, so the output is in logical order, instead of Perl's 'how I feel like it' way. Also added a print line so you can see the output.

My test CSV contained:
part1,cat1 cat2 cat3
part2,cat4
part3,cat2 cat4 cat5

And the output is:
part1 - cat1
part1 - cat2
part1 - cat3
part2 - cat4
part3 - cat2
part3 - cat4
part3 - cat5

Please tell me I got it the right way around this time!

If you run it from a command prompt you'll see that it works.

StevenHu




msg:3747898
 2:33 pm on Sep 19, 2008 (gmt 0)

You got it right!

So all it need to do is output the CSV file from Excel and save it as part_info.csv. Then in Window's cmd.exe I type the above script and that's how it executes? I'll try this soon!

Thanks,
Steve H

StevenHu




msg:3747900
 2:34 pm on Sep 19, 2008 (gmt 0)

No, it doesn't work in cmd.exe. Which program do I use to execute the script?

Thanks,
Steve

StevenHu




msg:3747915
 2:56 pm on Sep 19, 2008 (gmt 0)

For some reason my posts' letters are turning into garbage. For instance, j#*$!xx should be j with 5 X's next to it. Keep that in mind as you read the following:

OK, I downloaded ActivePerl and installed it and ran the script. I get the following error for each line:

Bareword found where operator expected at part_info.csv line 3108, near "6005j#*$!xx" <Missing operator before j#*$!xx?

6005j#*$!xx is a valid category code. Every category code is 10 characters long (no spaces) and is composed of letters and/or digits and/or a period.

Thanks,
Steve

[edited by: StevenHu at 2:59 pm (utc) on Sep. 19, 2008]

Dabrowski




msg:3748000
 4:03 pm on Sep 19, 2008 (gmt 0)

Bareword found where operator expected at part_info.csv line 3108, near "6005j#*$!xx" <Missing operator before j#*$!xx?

The thing that worries me about that is that it says part_info.csv, that should be the name of the script. Make sure your script is called something like part_info.pl.

Secondly it says line 3108, when the script only has about 25 lines.

You need to have 2 separate files, one .pl file which contains the script above, and one .csv with your data in.

The script will work for any characters in your codes EXCEPT a comma or a space.

StevenHu




msg:3748040
 4:38 pm on Sep 19, 2008 (gmt 0)

I have two files: categories.pl and part_info.csv.

When I tried it again, I get the error,

C:\Documents and Settings\shusting>perl categories.pl
Unrecognized character \xA6 in column 29 at categories.pl line 3.

That character is the first pipe symbol. Strange. (I had copied and pasted the script into Windows Notepad.)

I then typed over the symbol with my own keyboard and ran the script again. It printed the result to the screen. Looking better! Unfortunately, I could see only the last part of the 3000 plus lines.

I notice that the script removed the trailing zeroes (see following). 6015.318 should be 6015.31800 Can the script keep the zeroes?
==============================

LRP39255 - 6015.318
LRP39292 - 6015.318
LRP39294 - 6015.318
LRP39310 - 6015.31201
LRP39310 - 6015.31202
LRP39321 - 6015.318
LRP39390 - 6015.318
LRP39410 - 6015.31201
LRP39410 - 6015.31202
LRP39411 - 6015.31201

==============================
Occasionally there are gaps. The gaps appear to be the spaces between, or in front of, or behind, the categories. I can delete those lines.
==============================

SP59L -
SP59L -
SP59L - 7000.#*$!#*$!
SP59M -
SP59M -
SP59M - 7000.#*$!#*$!
SP59XL -
SP59XL -
SP59XL - 7000.#*$!#*$!
SP59XXL -
SP59XXL -
SP59XXL - 7000.#*$!#*$!

==============================

I'm looking through the Perl manual and can't find the script to write the results to a CSV file rather than to the screen. Can you help? It looks like we are almost there!

Thanks,
Steve

Dabrowski




msg:3748061
 4:55 pm on Sep 19, 2008 (gmt 0)

"That character is the first pipe symbol. Strange."

Yeah sorry, forgot to mention. Despite the fact that this is a coders website, they still cannot yet display one of the most common symbols properly. You will always have to replace this in any copied code.

"I notice that the script removed the trailing zeroes"

Nope, Excel removed the training zeros. You'll have to faff with it and get it to output all fields as text. It thinks it's a number, which it technically is I guess. You could set it to display to 5d.p. that would work too.

"Occasionally there are gaps"

My bad. This is caused when you have more than one space separating your categories, probably where they've been deleted at some point, or just not entered consistently.

Change this line:
[perl] $parts -> {$part} = [ split( /\s+/, $categories)]; [/perl]

I've added a + character in the regex, that should sort it.

"I'm looking through the Perl manual and can't find the script to write the results to a CSV file rather than to the screen"

The manual will only show you how to use commands, not how to put them together. You're expected to work that bit out for yourself!

ok, change the output loop like this to output to a new csv...

[perl]open( NEWCSV, ">new_part_info.csv") ¦¦ die "File open failed\n";
foreach my $partno ( sort keys( %$parts)) {
foreach my $category ( sort @{ $parts -> {$partno}}) {

print "$partno - $category\n";
print NEWCSV "$partno,$category\n";
}
}

close( NEWCSV);[/perl]

That should work, but rememeber to change the filename if you need to, and to change those pipes!

You can see, apart from adding a line to open a new file, all you need to do is add another print line, but telling it to send it to the new file instead of the screen.

[edited by: Dabrowski at 4:56 pm (utc) on Sep. 19, 2008]

Dabrowski




msg:3748074
 5:13 pm on Sep 19, 2008 (gmt 0)

As an afterthought, change this line more! It'll help if there are leading or trailing spaces around the line, or the comma.

[perl]$parts -> { chomp( $part)} = [ split( /\s+/, chomp( $categories))]; [/perl]

perl_diver




msg:3748103
 5:53 pm on Sep 19, 2008 (gmt 0)

Dabrowski,

On Windows the file extension is not important. You can name a perl file anything you want as long as you invoke it with perl:

c:\>perl myfile.foo

This still looks like it will not work:



my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> {$part} = [ split( /\s/, $categories)];
}

You want to use push() to populate $parts->{$part} with all the values from the file. Your code will still redefine $parts->{$part} with a new value for each line in the file and the result will be that the last value in the file will be retained.



my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
push @{$parts->{$part}},$categories;
}

Or have I missed something by not reading all the posts in the thread?

On a side note, I can't beleive that this forum still inserts those silly double-pipes into messages to replace single pipes. Why can't that be fixed? Even if the files this forum uses are pipe delimited it should not be a problem.

Dabrowski




msg:3748157
 6:57 pm on Sep 19, 2008 (gmt 0)

You can name a perl file anything you want as long as you invoke it with perl

You're right of course, but this is a CGI forum and after all, assuming he's using IIS here, it will only execute a .pl file (yes I realise we're not using this in a CGI context). Plus if he's a beginner he should learn the right way.

This still looks like it will not work

ok, I think you made the same mistake I did, and got the input/output streams mixed up.

He has this:
partno,cat1 cat2 cat3

He wants this:
partno,cat1
partno,cat2
partno,cat3

You see the first time I was mistaken, I used push as I thought each category was on a different line.

So using the second method, first I split the line into part number, and categories as a string, then split the categories string into an array.

It really does work, honest! ;)

perl_diver




msg:3748280
 9:34 pm on Sep 19, 2008 (gmt 0)

ahh..... I did make a mistake. Sorry about that, thanks for pointing it out diplomatically. :)

But this all seems academic if he is actually using PHP.

phranque




msg:3748287
 10:08 pm on Sep 19, 2008 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], Steve!

For some reason my posts' letters are turning into garbage. For instance, j#*$!xx should be j with 5 X's next to it.

that is the forum spam/swear filter that is preventing you from posting a triple-X.

That character is the first pipe symbol. Strange. (I had copied and pasted the script into Windows Notepad.)

[webmasterworld.com...]

It printed the result to the screen. Looking better! Unfortunately, I could see only the last part of the 3000 plus lines.
...
I'm looking through the Perl manual and can't find the script to write the results to a CSV file rather than to the screen.

there are two ways you can do this.
in your current script, the "print" statement is outputting to "standard output", which can be redirected to a named file in the command line using the "greater than" (>) symbol.
or you can "open" a file handle to a named file and pass the file handle to the print statement as suggested by Dabrowski.

StevenHu




msg:3748302
 10:44 pm on Sep 19, 2008 (gmt 0)

I took an online class in Perl a while back. It was too much information too fast for me to pick up. However, I learned enough to recognize that this job called for Perl!

Thanks for helping me shave a week off of this work!

Steve

Dabrowski




msg:3749042
 1:52 pm on Sep 21, 2008 (gmt 0)

But this all seems academic if he is actually using PHP

That was mentioned, but I think in the end an offline script to do some dog work was what he really required.

Saved having to hire a temp! ;)

StevenHu




msg:3749526
 2:31 pm on Sep 22, 2008 (gmt 0)

I needed Perl to work on the CSV output from Excel. From there, I'll take the Perl output and create a MySQL table that will be accessed through PHP. I hope this explains how PHP enters in.

Steve

StevenHu




msg:3749697
 5:44 pm on Sep 22, 2008 (gmt 0)

I just tried the script and got error messages:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

Global symbol "$parts" requires explicit package name at categories.pl line 4.
Global symbol "$parts" requires explicit package name at categories.pl line 5.
Global symbol "@lines_in_csv" requires explicit package name at categories.pl li
ne 15.
Execution of categories.pl aborted due to compilation errors.

Here is the script:

use strict;

open( NEWCSV, ">new_part_info.csv") ¦¦ die "File open failed\n";
foreach my $partno ( sort keys( %$parts)) {
foreach my $category ( sort @{ $parts -> {$partno}}) {

print "$partno - $category\n";
print NEWCSV "$partno,$category\n";
}
}

close( NEWCSV);

my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> { chomp( $part)} = [ split( /\s+/, chomp( $categories))];
}

foreach my $partno ( sort keys( %$parts)) {
foreach my $category ( sort @{ $parts -> {$partno}}) {

## Now you can do something with $partno and $category
## $partno is your original part number
## $category will change each time singularly while $partno stays the same

print "$partno - $category\n";
}
}

Why is it acting differently now?

Thanks,
Steve

Dabrowski




msg:3749745
 7:08 pm on Sep 22, 2008 (gmt 0)

I really don't know how you've managed it, but all the parts seem to be jumbled up, and the part that reads your CSV file is gone.

Here's the full script again, with some additional commentary...

[perl]use strict;

## This bit reads in your CSV file.
open( CSV, "part_info.csv") ¦¦ die "Open file failed!\n";
my @lines_in_csv = <CSV>;
close( CSV);

## This bit reads all your part details, and makes a list of the categories
my $parts;
foreach( @lines_in_csv) {
my ( $part, $categories) = split( /,/);
$parts -> { chomp( $part)} = [ split( /\s+/, chomp( $categories))];
}

## This bit opens a new csv file
open( NEWCSV, ">new_part_info.csv") ¦¦ die "File open failed\n";
foreach my $partno ( sort keys( %$parts)) {
foreach my $category ( sort @{ $parts -> {$partno}}) {

## Here, we have $partno and $category, one by one
## So we output once to screen, and once into new file
print "$partno - $category\n";
print NEWCSV "$partno,$category\n";
}
}

close( NEWCSV);[/perl]

Why don't you tell us how your MySQL table is arranged? This script can just as easily create your table for you too?

This 40 message thread spans 2 pages: 40 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved