Forum Moderators: coopster

Message Too Old, No Replies

custom text parser to import into mysql

text parser mysql php

         

vxa136

8:22 pm on Mar 7, 2007 (gmt 0)

10+ Year Member



I have a text file that I need to parse into a mysql db. I've not coded in regxp as yet, so I'm hoping someone can help. The text file is pretty well formatted but I need the php code to parse it into a mysql db.

The text file is as follows :

DATE//1st line not needed
@!$#@#$# //2nd line not needed
#@@#$@#$ //3rd line not needed
#@$@#$@# //4th line not needed
//line break
Record "n" of "total"-->INTO COLUMN 1
//line break
DN: Database Name
VALUE-->INTO COLUMN 2
TI: Title
VALUE-->INTO COLUMN 3
AU: Author
VALUE(1);VALUE(2);...;VALUE(N)-->INTO COLUMNS 4,5...uptil N
AF: Affiliation
VALUE(1);VALUE(2);...;VALUE(N)-->INTO COLUMNS 5,6...uptil N
SO: Source
VALUE--> INTO COLUMN 6 (if AU & AF are 1 value each)
DE: Descriptors
VALUE-->INTO COLUMN 7 (")
PY: Publication Year
VALUE-->INTO COLUMN 8 (")
//line break
Record "n+1" of "total"
.
.
.
.
.
.
.
.
.
.
.
.

(Note: The above format is EXACTLY how the file exists)

Initially I used editplus to clean the text file using search and replace and successfuly converted it into a csv file (delimited by ~) and imported it onto the mysql db and it looks fine. I need to do the search & replace process in php though and I needed a jumpstart. :)

Thanks

/\\/

mcibor

11:41 am on Mar 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The first three colums don't require regex at all,

I have only one question: what do you mean by columns 4,5...N?
and then for the second time: columns 5,6...N?

How many rows do you want to input to db?

Could you provide me with two exemplary files with outputs, that you would want to have in db?

Reagards
Michal

vxa136

2:48 am on Mar 14, 2007 (gmt 0)

10+ Year Member



Michal,

GIVEN FILE FORMAT:

Thu Feb 22 15:19:55 EST 2007
CSA
Database: PsycINFO
Query: SO=(Journal of Marketing)

Record 1 of 486

DN: Database Name
PsycINFO
TI: Title
The editor's corner
AU: Author
Lincoln, Douglas J (1)
AF: Affiliation
(1)Boise State University, ID, US
SO: Source
Journal of Marketing Education. Vol 27(3), Dec 2005, pp. 211
DE: Descriptors
*Education; *Marketing; *Teaching; Career Development; Critical
Thinking; Debates; Learning; Management; Organizational Structure;
Professional Development; Satisfaction; Technology
PY: Publication Year
2005

Record 2 of 486

DN: Database Name
PsycINFO
TI: Title
Introducing Professional and Career Development Skills in the
Marketing Curriculum
AU: Author
Kelley, Craig A (1); Bridges, Claudia (1)
AF: Affiliation
(1)Department of Management, California State University, Sacramento,
Sacramento, CA, US
SO: Source
Journal of Marketing Education. Vol 27(3), Dec 2005, pp. 212-218
DE: Descriptors
*Career Development; *Curriculum; *Marketing; *Professional
Development; Curriculum Development
PY: Publication Year
2005

(CONTINUES TILL RECORD # 486)

REQUIRED FILE FORMAT :

1~PsycINFO~The editor's corner~Lincoln, Douglas J(1)~(1)Boise State University, ID, US~Journal of Marketing Education. Vol 27(3), Dec 2005, pp. 211~*Education; *Marketing; *Teaching; Career Development; Critical Thinking; Debates; Learning; Management; Organizational Structure; Professional Development; Satisfaction; Technology~2005

2~PsycINFO~Introducing Professional and Career Development Skills in the Marketing Curriculum~Kelley, Craig A(1)~Bridges, Claudia(1)~(1)Department of Management, California State University, Sacramento, Sacramento, CA, US~Journal of Marketing Education. Vol 27(3), Dec 2005, pp. 212-218~*Career Development; *Curriculum; *Marketing; *Professional Development; Curriculum Development~2005

(CONTINUES TILL RECORD# 486)
what i meant by the columns 4,5...N is that if there are more than one author for a paper then I would need to store each author in a separate column.

similar for affiliations...

-/\ \/

mcibor

10:30 pm on Mar 20, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK. What I would do is a mixture of few techniques:

<?php
//read file with file [php.net] function

$lines = file('file_or_url_to_read.txt');

$processing = 0; // a temp variable to show us what line are you processing (of the record, not file)
$matches = '';//initialization of matches variable
foreach ($lines as $line_num => $line) {
if(!$processing) if(preg_match('@(^Record )([0-9]+)( of [0-9]+$)@'), $line, ) $processing = 1;
if(!$processing) continue;

//here comes the check part
//but please let me continue this tomorrow ok?

sorry for inconvenience
Michal

mcibor

11:28 am on Mar 21, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Once again me :)

<?php
$lines = file('file_or_url_to_read.txt');//read file
//$lines = explode("\n", $str);//use this if you read file as a string

$final_arr = array();
$processing = 0; // a temp variable to show us what line are you processing (of the record, not file)
$matches = '';//initialization of matches variable
$record = '';
$title_end = 0;
$author_end = 0;
$aff_end = 0;
$source_end = 0;
$desc_end = 0;

foreach ($lines as $line_num => $line) {

if(!$processing) if(preg_match('@(^Record )([0-9]+)( of [0-9]+)@', $line, $matches)) $processing = 1;
if(!$processing) continue;

//here comes the check part
$processing++;
echo $processing;
switch($processing) {
case 2://this is first row
$record = $matches[2];
$final_arr[$record]['record'] = $record;
break;
case 5:
$final_arr[$record]['dn'] = $line;
default:
break;
}
if($processing < 7) continue;

if(!$title_end){
//you've got database name now and number of record
//now is title:
if(!preg_match('@^(AU: Author)@', $line)){ $final_arr[$record]['title'] .= $line; continue;}
else {$title_end = 1; continue;}
}
if(!$author_end){
//now is author:
if(!preg_match('@^(AF: Affiliation)@', $line)) { $final_arr[$record]['author'] .= $line; continue;}
else {$author_end = 1; continue;}
}
if(!$aff_end){
//now is author:
if(!preg_match('@^(SO: Source)@', $line)){ $final_arr[$record]['affiliation'] .= $line; continue;}
else {$aff_end = 1; continue;}
}
if(!$source_end){
//now is author:
if(!preg_match('@^(DE: Descriptors)@', $line)){ $final_arr[$record]['source'] .= $line; continue;}
else {$source_end = 1; continue;}
}
if(!$desc_end){
//now is author:
if(!preg_match('@^(PY: Publication Year)@', $line)){ $final_arr[$record]['description'] .= $line; continue;}
else {$desc_end = 1; continue;}
}
$final_arr[$record]['publication'] = $line;
$processing = 0; // reset all temp variables
$matches = '';
$record = '';
$title_end = 0;
$author_end = 0;
$aff_end = 0;
$source_end = 0;
$desc_end = 0;
}

print_r($final_arr);
?>

I tested it and I think the result should satisfy you.

Regards
Michal

PS. I'm so good :D

vxa136

10:37 am on Mar 24, 2007 (gmt 0)

10+ Year Member



thanks a ton michal,

it really gave me some perspective....and oh yeah ure pretty darn good :)

I ran the code and it gives out a very satisfactory array. Now I would like to know whether this array can be imported into a mysql database such that each variable in the array ([record],[title]..so on..) can be put into columns..(would appreciate complete sql code..for this one..major noobing here :)]

Alternatively...i did some looking around and came across some places where the array can be converted directly into xml data (which is the desired result at the end)....

would buy a beer in the event i meet a 'michal' :P...

thanks

mcibor

7:29 am on Mar 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Array to xml:
[phpclasses.org...]
[phpclasses.org...] - uses php5

To mysql:


foreach($final_arr as $row) {
$secure = array_map('mysql_real_escape_string', $row);
$sql = "INSERT INTO `table` ('dn', 'ti', 'au', 'af', 'so', 'de', 'py')
VALUES('{$secure['dn']}', '{$secure['title']}', '{$secure['author']}',
'{$secure['affiliation']}', '{$secure['source']}', '{$secure['description']}',
'{$secure['publication']}')";
mysql_query($sql) or die("Error: ".mysql_error());
}

This should put the data into db.
Regards
Michal

PS. Regarding the beer I'm in Poland now, but am travelling a lot, so no problem for me ;)

[edited by: coopster at 5:23 pm (utc) on Mar. 26, 2007]
[edit reason] removed link to non-authoritative site [/edit]