I've got a little problem and would appreciate any help from you guys please. I hope this makes sense!
I've have a MySql database which stores details of my client's properties. The data in the database comes from an online form which my clients complete. The form data goes into the database okay as I have viewed the content and it is just how I want it.
Now I have wrote a CGI script that queries the database for a particular home (i.e. home_id=1) and then it displays the home details on a page at my website.
Unfortunately my script is not displaying the database content correctly. It displays the content but not in the correct format.
My problem is that in the database a "Property Description" is made up of a blob of text which has for example 3 paragraphs of text in it. My script queries the database and the "Property Description" text is stored in the variable $propDesc. The script then goes on to the HTML bit which displays the content of that variable $propDesc on the website page. Unfortunately the text is now all on one line with no paragraphs - so the 3 paragraphs from the database are now all together with no newlines. For some reason it is not taking any notice of the newlines which are in the database content. Obviously something is going wrong at the parse stage.
If it helps, my CGI script has Use DBI and Use CGI at the top so those are the modules I am using.
This is probably basic stuff to you experts out there and I hope you can help me or at least direct me to somewhere that covers this.
If you need to know anything else please let me know.
Thanks in advance
Sarah
HTML is not sensitive to 'whitespace' characters such as tabs, spaces, newlines, and carriage returns. If you are just separating paragraphs with newlines, then quite likely the HTML source that the browser sees has the text separated, but the browser mushes it all together, such that any number of consecutive whitespace caracters are rendered as a single space.
Try looking at the source code for the page you see in your browser. Exactly where the menu option is changes by browser, so I don't know where you should look on yours. If the paragraph breaks are there in your source, this is your problem.
There are three ways I can think of to deal with this - the right way and two cheating ways ;)
The right way is to wrap your paragraphs in HTML paragraph tags, eg, "<p>my first paragraph text here, maybe broken over several lines</p><p>my second paragraph text here.</p>". You can do this either before you insert the paragraph into the database, if the database will never be used for anything other than generating HTML, or between the time that you dig it out of the database and the time you print it to the document.
The easy way to cheat would be to wrap the whole thing in a <pre></pre>, so that whitespace would suddenly matter in that region of your document. Really, there aren't a whole lot of times when this is good markup, since it doesn't describe the data inside the tag very well.
The harder way to cheat would be to replace every newline in your text with "<br />". Still needs almost as much text processing as the right way, only without the nice structural mark-up.
Thanks for your help. Script is written using Perl.
I can understand what you mean about the whitespace etc. I use MySqlFront as a front end to view the contents of my database and when I make an amendment to any of the text in the database and view the SQL statement that it generates it will say for example "Insert into homes (prop_desc) value="This home is located in Surrey. \nIt has 4 bedrooms.\n It is also close to local shops.";
So I have assumed it is including the newline commands. But the problem is when I extract that text from the database and assign it to a variable and then print that variable that the newlines don't show.
If I look at the source of my HTML page it has no <p> or <br> tags where they should be. Which is what you were asking me to check.
What I do know works is, say for instance, the initial form that my clients complete with their home details is sent to me in an email using sendmail with paragraphs of text, the email that I receive shows it with paragraphs in it. But if I display the form contents on the screen it loses the newlines.
Does this make sense?
I can't really expect my clients to type <P> and <br> tags into their text when they complete the online form as many of them will probably be novice users and wouldn't understand what I mean! :-)
Thanks.
Sarah
E-mail is usually not HTML, and therefore the newlines will be displayed as new lines.
I would suggest that you handle this by processing the text that goes on the web page between the time you extract it from the database and the time you print it to the page. If the description text is being stored in $desc, you might do something along the lines of:
$desc =~ s/(^¦\n\n)(.*?)(\n\n¦$)/<p>$2<\/p>$3/sg;
to wrap each paragraph in <p> and </p> tags.
Note that I have not tested that regular expression, so it comes with no guarantees. Especially since my first tries at regular expressions nearly always do something comically different from what I intended. Andreas Freidrich is a much better Perl programmer than I, and will quite likely be 'round to set me straight on this one eventually :) One of the great things about this place is that whatever you're doing, there's someone around who knows it really well and is willing to help you learn.
Speaking of which, welcome!
I don't remember whether Perl changes its interpretation of that escap[e] sequence based on the platform on which it is running or not.
Depends on the context it is being used in. See perlport - Writing portable Perl [perldoc.com].
Running the following code under Windows gives you:
E:\>perl -e "printf '%lx', ord(qq{\n})"
a which is 0x0A or \012 or the traditional *nix newline. No change there. If you read a text file, then Perl translates 0x0D0x0A or \015\012 or \r\n to the logical \n.
However, this might not prove to be as helpful as it may look prima facie. If your server runs on Linux but the browser that was used to enter the data was running Windows than Perl´s magic won´t work.
To be on the save side I´d always check for both carriage return and newline.
dingman´s regular expression
That problem aside the RE posted by dingman is not quite right. Suppose we had the following string
AC rocks!\n\n
AC rocks!\n\n
AC rocks!\n\n
I have said it thrice: What i tell you three times is true.
where \n is a logical newline (i.e. either \012, \015 or \015\012 depending on the actual platform).
It matches at the beginning of the string or two newlines followed by as little characters as possible followed by two newlines or the end of the string. This is then replaced by "<p>string</p>$3". Our string now looks like this:
<p>AC rocks!</p>\n\n
AC rocks!\n\n
AC rocks!\n\n
I have said it thrice: What i tell you three times is true.
Since the g modifier is used we start matching again at the place where left off the last time. This is right before the second line. There is however no \n\n nor the beginning of the string. The RE engine goes on and finds the next match:
<p>AC rocks!</p>\n\n
AC rocks!
<p>AC rocks!</p>
I have said it thrice: What i tell you three times is true.
The problem here is that the \n\n that is added at the end of the substitution string is not "pushed back on the stack of things to match". The solution would be to not match the "closing" \n\n but to use the zero width positive look-ahead assertation:
My suggested solution
$desc =~ s{
# parens are for grouping only
.. (?:
# match either at start
.... ^¦
# or two or more \r\n (client was Windows)
.... (?:\x0d\x0a){2,}¦
# or two or more \n (client was *nix)
.... \x0a{2,}¦
# or two or more \r (client was MacOS9-)
.... \x0d{2,}
.. )
# capture the paragraph
.. (.*?)
# look ahead for either end of string or two or more \n
# or two or more \r\n, but don´t match what we see!
.. (?=
# parens are for grouping only
.... (?:
...... (\x0d\x0a){2,}¦\x0d{2,}¦\x0a{2,}¦$
.... )
.. )
# replace with
}{
.. <p>$1</p>
}sgx; To actually use this RE you need to remove the leading dots and replace the broken pipe characters with the real vertical bar character.
This works even on strings which have more than two newlines between paragraphs.
AC rocks!\n\n\n\n
AC rocks!\n\n
AC rocks!\n\n
I have said it thrice: What i tell you three times is true.
I haven´t had the chance to actually test this code but believe it to work as described. (I made up the output on Windows ;) Should be right however).
Andreas Fr[ie]drich is a much better Perl programmer than I, and will quite likely be 'round to set me straight on this one eventually
Andreas
Thanks for all the help up to now. I must admit to being even more confused than when I started! Sorry.
I actually did a whole semester on Perl programming at Uni but that was quite a while ago. I probably shouldn't admit to that. But as I haven't had much use of it since leaving uni I am now extremely rusty.
I've tried to sus out both suggestions and neither works for me. So I have got my Perl Programming book out of storage and having another look at it, now that I am steered towards pattern matching and substitution.
The HTML page that I am producing is only for my use and I am using Windows so only the bit about Windows and not Mac etc would apply.
Because I am new to these forums I don't always understand the abbreviations used in explanations. :-) So I have tried what Andreas has suggested but I just get a blank screen in my webpage. I tried Dingman's suggestion and my page displayed but no paragraphs.
Incidentally the variable $desc may have more than 3 paragraphs of text as it depends on how many paras my clients write in. Some write only one long para and others write loads!
Thanks.
Sarah
[edited by: SarahG at 1:00 am (utc) on Oct. 28, 2002]
No offense intended. It was more a tongue in cheek comment and I don't know the smiley for that one!
Sarah
So I have tried what Andreas has suggested but I just get a blank screen in my webpage.
Did you remove the leading dots and replaced the broken pipe characters with the vertical bar character?
Incidentally the variable $desc may have more than 3 paragraphs of text
That´s what the g modifier is for. The regular expression will work for as many paragraphs as are contained in the string.
BTW I installed Perl on my Windows box and the RE worked just fine.
If you are still struggling with this it would be helpful if you could post the relevant sections of your script.
Andreas
Just logged on today! Seen your message. Funny enough I have now sorted it with a very simple bit of code which does exactly what I need for the time being. It probably isn't a catch all like you suggested but is will do for now.
The code I used is
$desc=~ s/\r\n/<p>/g;
I noticed when I used MySQLFront to view my blobs that it has a \r\n rather than \n\n for the new paragraphs so I adjusted the code accordingly.
I tested it late last night for quite a few of my properties and it worked exactly as I wanted it to.
Thanks for your help though.
Great forum and friendly welcomes!
not XHTML compliant - However, you need to be aware that omitting the closing p tag won´t work for XHTML documents.
paragraph defined by two newlines - \r\n is just one logical newline on Windows. Dingman and I silently assumed that a paragraph is seperated from the next one by the user hitting enter twice:
They hunted till darkness came on, but they found Not a button, or feather, or mark, By which they could tell that they stood on the ground Where the Baker had met with the Snark.\r\n
\r\n
In the midst of the word he was trying to say, In the midst of his laughter and glee, He had softly and suddenly vanished away--- For the Snark *was* a Boojum, you see.
The above are only two paragraphs. It would look like this using your RE:
They hunted till darkness came on, but they found Not a button, or feather, or mark, By which they could tell that they stood on the ground Where the Baker had met with the Snark.<p>
<p>
In the midst of the word he was trying to say, In the midst of his laughter and glee, He had softly and suddenly vanished away--- For the Snark *was* a Boojum, you see.
This code will look ok in browsers, since the W3 has this to say about empty p elements [w3.org]:
User agents should ignore empty P elements.
However, they also discourage authors from using empty P elements.
paragraph defined by just one newline - If you want to start a new paragraph for each logical newline (\r\n or rather \015\012 on Windows) as your solution does, then you would need to remove the {2,} parts from the regular expression I posted.
Andreas