Forum Moderators: coopster

Message Too Old, No Replies

remove html chars from csv file

remove html chars from csv file

         

phparion

12:56 pm on Sep 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi

i am trying to upload a CSV file into database but most of its values contains html characters and tags due to which values are not entered properly and some fields are left blank as sometimes ',:,; etc occurs in the values of columns and disturbs the INSERT command...

as I am using LOAD DATA INFILE command to upload this 50 MB csv file and I think there is no way to escape HTML chars with LOAD DATA INFILE (if there is pls let me know) so now i am thinking to write a script that can explode the csv file and then use htmlentities or htmlspecialchars functions to convert the html characters to string form so that it doesnt disturb INSERT command while dumping..

and then automatically dump the csv into database after converting html chars to string..

can anyone help me to write the portion of the code that can search the CSV file by column and convert all html chars to string?

thanks in advance

P.S. or is there any way to convert / escape html chars with LOAD DATA INFILE command?

coopster

2:11 pm on Sep 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yes, the LOAD DATA INFILE Syntax [dev.mysql.com] has an optional
ESCAPED BY
clause you could incorporate.

phparion

2:32 pm on Sep 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



i read the manual again and again but not getting how to tell the LOAD DATA INFILE ... ESCAPED BY .. to escape HTML characters like ',;:<br><font face=.....><h1> etc etc..

can you please write a sample command that can

LOAD DATA INFILE ..... ESCAPED BY (all html characters)

thanks

coopster

2:42 pm on Sep 22, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



There are examples on that page.
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test 
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'