homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

Is Tight Code Best?
Does Perl Tokenize Efficiently (So I don't have to?)

 7:46 pm on Sep 11, 2006 (gmt 0)

A lot of times I feel lazy or stupid when I write a Perl script because I don't "solve for n" and reduce the code to some Einstein like one-liner regular expression driven hash.

Other times, I do reduce my code down to some brilliant one-liner -- and then go back and find that changing one character in one function now makes 37 other things not work.

I watch for just plain wasted cycles (reading in a loop to EOF when you only needed to read 10 records and quit), but I like my code to be easy to modify so I write a lot things in verbose "non dependant" manner, (e.g. - creating a new variable to hold some value even though a global Perl variable may hold the value).

I have read a bit about how Perl parses and tokenizes scripts, (also about "Tidy:"), and am wondering how efficiently the interpreter works -- how much it matters how you write a perl script...

(three print commands coded individually)

print "This is line 1.<br>\n";
print "This is line 2.<br>\n";
print "This is line 3.<br>\n";

(three lines reduced to one print command)

# below contains lines 1 through 3
# please be careful if you edit line
print "This is line 1.<br>\nThis is line 2.<br>\nThis is line 3.<br>\n";

In some [parser/interpreter/compiler] tokenization schemes I've seen, tokens are optimized by the parser and there may be no run-time difference. (If, in example above the parser concatenates the three consecutive print statements into one token and a string)...

What is better:

A). tight code with no comments and minimized white-space to save every last bit and cycle.

B). long-hand legible code with lots of comments and white-space used for code formatting.

C). some happy medium



 8:06 pm on Sep 11, 2006 (gmt 0)

It depends on how the script is used and if others will have to maintain it. But in general I would say 'B' is best but 'C' sounds OK too. 'A' sounds like a bad way to code as a habit.


 8:13 pm on Sep 11, 2006 (gmt 0)

In the vast majority of cases, programmer efficiency is much more important than code efficiency.

A sounds linke "premature optimization", unless you have done measurements to show that your tight code actually does run faster than the readable version.

B should be the norm.

If measurements have shown a specific part of B not to be efficient enough, then you can optimize that part (and only that part), which may give you a useful definition of C.


 9:50 pm on Sep 11, 2006 (gmt 0)

I read a little about tokenizing perl. One thing I read suggests the parser may optimize multiple statements that share the same token+data format into more compact tokenized code;

"So the job of yylex each time it's called
is to figure out what the next token is, set
yylval with any additional data
appropriate for the token, and return the token".

- [foo.be...]

This is the kind of stuff I would like to hear from somebody who has benchmarked and optimized Perl and can give a ballpark on efficieny gains (if any) for writing leaner code.


 10:32 pm on Sep 11, 2006 (gmt 0)

Ballparks aren't useful information in this context. You can't optimize your code based on what other people measured with their (completely different) code. If you have a program where you think it doesn't run fast enough, profile it, and then you know where your bottlenecks are. Only with this very specific information can you take any useful countermeasures.

In all other cases: If it ain't broken, don't fix it.


 10:44 pm on Sep 11, 2006 (gmt 0)

use the Benchmark module to test code with.


it's a core module so should be installed.

In general this type of construct:

print "This is line 1.<br>\n";
print "This is line 2.<br>\n";
print "This is line 3.<br>\n";

is better written as:

print qq~This is line 1.<br>
This is line 2.<br>
This is line 3.<br>

which to me is less confusing than your original intention:

print "This is line 1.<br>\nThis is line 2.<br>\nThis is line 3.<br>\n";


 5:28 am on Sep 12, 2006 (gmt 0)

which to me is less confusing than your original intention:

print "This is line 1.<br>\nThis is line 2.<br>\nThis is line 3.<br>\n";


"original intention"?... I used two examples of admittedly overcomplicated code and asked a multiple choice question about coding "style".

I was not looking for "how to most efficiently code multi-line string output". (Right answer maybe, but wrong question).

But at least we can come back to part of the question with: Do qq constructs tokenize differently than one or more consecutive "print" statements?

I only wonder this in the context of the overall Perl code->parser->interpreter scheme of things and weighted towards (possible) future human maintenance of code in general.

Benchmark is useful for answering 1/3 to 1/2 of the question. It can't show how much harder it may be working on code that's too tight to read or modify.

I am not asking for "one right answer", just other programmer's opinions.


 6:13 am on Sep 12, 2006 (gmt 0)

Benchmark can help you determine if perl tokenizes code efficiently. Like in your thought about three print commands versus one print command. The rest of your question is asking for personal opinions, which seem to be in agreement that 'B' is best followed by 'C'.

Larry Wall has some suggested guidelines about perl coding style which you can read here:


as well as other suggestions for keeping code readable and understandable.

I don't know how much impact keeping code readable versus keeping code tight has on the interpreter. But judging by some modules written by some of the better perl programmers they don't seem too concerned with lots of comments and white space in the code.

You might get a better answer on the perl monks site to your question:



 7:17 am on Sep 12, 2006 (gmt 0)

Lexpixel, I'm not sure if you're using the term "tokenizing" correctly. Tokenizing is just the very first step that the interpreter must do when processing your code: Splitting the program text into individual tokens. Obviously, less text will make that step faster. But then, until your program can actually run, the tokens still need to be parsed for syntax and semantics. And those two steps might actually be more efficient with code that is spelled out more explicitly, while very compact statements may cause additional overhead.

And as soon as your program does any kind of non-trivial processing, the tokenizing and parsing steps quickly become irrelevant, because actual processing time will be much higher in comparison.

Maybe this makes it easier to understand why measurements only make sense for specific code examples and specific use cases. There simply are too many variables involved, and even the same code my produce different results depending on how it is used.


 6:41 pm on Sep 12, 2006 (gmt 0)

"...parsed for syntax and semantics. And those two steps might actually be more efficient with code that is spelled out more explicitly, while very compact statements may cause additional overhead.."


Bird, that is part of what I was getting at also. In other languages I have seen interpreters [and/or] compilers slow drastically or even choke on long strings. Others, during lexical analysis, may run out of memory processing huge strings, (I believe I read Perl can use all available memory).

My questions about comments is because I have seen compilers that store programmer's comments in the binary, (use a bin/hex editor and check an .EXE ---look at the ascii, usually in the right-hand window of most editor). Since perl is interpreted, (as opposed to compiled), this would only exist in memory at run-time...

The Perl Style Guide link is great. Although Larry says, "Use here documents instead of repeated print() statements.", he does not give any reason ---- (but he is big daddy, so he don't have to)...

"...rest of your question is asking for personal opinions, which seem to be in agreement that 'B' is best followed by 'C'. "


Ok, you and bird are in agreement --- (sort of) --- no need for anyone else to reply....<grin>

Part of the Perl Style guide from Larry reads; "Think about reusability...... Consider giving away your code. Consider changing your whole world view. Consider... oh, never mind."

This is my objective. I want to give my code away, but I don't want;

- to have to write 1000 lines of comments to explain why or how 100 lines of code works when my style of variable naming, code formatting and long-line splitting makes the code entirely readable with few comments needed.

- to have people who get the free code cry that they could "name that tune" (in 5 notes or less), when my verbose method works fine and is done to make the code easy to modify (e.g.- several short regex lines operating on the same string --- they could be combined, but aren't because that would make the entire mess change with any single change).


 7:15 pm on Sep 12, 2006 (gmt 0)

In other languages I have seen interpreters [and/or] compilers slow drastically or even choke on long strings.

That shouldn't happen in a well written implementation, unless you're talking about strings many megabytes long. Other than that I can only repeat my mantra: If you want to know whether it makes a difference, run a test.

...when my style of variable naming, code formatting and long-line splitting makes the code entirely readable with few comments needed.

Maybe you should consider writing Python instead of Perl... ;)


 9:43 pm on Sep 12, 2006 (gmt 0)

for $i (1..3) { print "This is line $i.<br>\n"; }



 11:07 pm on Sep 12, 2006 (gmt 0)



 1:05 pm on Sep 18, 2006 (gmt 0)

If you plan on selling the script, allowing others to work on it, or intend to upgrade it from time to time, then B is better. However, for scripts that are likely for minor automation / tasks that you'll never need to do upgrade upon or come back to, then A should be resorted to.

I think comments should be kept to bear necessities where code isn't self-explainatory. In some cases, comments can be minimized drastically if a certain set of rules / style is maintained and understood by those who may evaluate or further develop code.

I do know of some editors / compilers that will give you a production copy that has eliminated white spaces/comments. If you ever needed to edit the code, you would then refer to the developer version.


 3:57 pm on Sep 18, 2006 (gmt 0)

Might I suggest the book Effective Perl Programming by Hall and Schwartz?

In the intro, Hall puts it like this:
Perl baby talk is plain, direct, and verbose. It's not bad--you are allowed and encouraged to write Perl in whatever style works for you. You may reach a point at which you want to move beyond plain, direct, and verbose Perl toward something more succinct and individualistic. This book is written for people who are setting off down that path.
(emphasis in the original)

This book is all about when to do what idiosyncracies, and should help you set your levels of how much optimization makes sense. It's worth the price for the regular expression section alone.


 5:43 pm on Sep 18, 2006 (gmt 0)

The amount of energy expended trying to optimize code that is not a bottleneck is huge. Before doing optimization, I suggest doing measurements to identify where resources are not keeping up with requirements.

Making code readable should be the priority in all cases, except when a real urgent need to save machine cycles can be proven.

...just my 2c


 6:04 pm on Sep 18, 2006 (gmt 0)

If you've got a memory like mine, then option B every time.


 12:12 pm on Sep 19, 2006 (gmt 0)

I'm not sure how much code you've got; but I find consistency important. Nobody else looks at my Perl, but I have a few integrators who work with my PHP. Once they've gotten used to my style things are fine, until I do something out of the ordinary, then there's confusion.

I have a certain way of coding similar situations (e.g. I'll retrieve data as objects and use accessor methods) and my integrators get used to that. When there's a reason for me to do it different (e.g. retrieve it as a hash) my integrators then need to figure out how that works (which usually means asking me). Teaching them my style once (or making them figure it out) is fine. I don't think all the comments or simplification in the world will avoid that.

There are certain things I really like in Perl that may make it harder to read (e.g. map {}). My opinion is that it's not the worst thing in the world to make someone learn how this works as long as they only have to learn it once and you make use of it throughout the code. I think every programmer has their "toolbox". I dislike it when someone makes some code really tight just to be nerdy and the rest of their code is loose/sloppy.

So my choice is with C as long as you're consistent.

Admitedly, there are times when I purposefully make things confusing, just as a matter of territory, but that's another topic ;)


 11:30 am on Sep 20, 2006 (gmt 0)

Some good rules of the road when coding that I like to stick by are:

* Try to write code that does not need alot of comments (good var names, indentation, function names, consistant styles, etc.).

* Use comments where it makes sense too. Using a two line comment to explain a 15 line function is better than having someone trawl the code trying to work out what it does. Also if you've done something "strange" explain why, you'll be glad you had when you return to it in 6 months time.

* Stripping out comments for speed gains is pretty serious and clutching at straws, reminds me of the old tips about removing the desktop image from you windows machine to make it go faster. If you have to go to the lengths of doing that then its time to upgrade.

* Show your workings, if you have used some code from another site leave us a link to the page in a comment so we can follow it up should we need to. Got some horrid business logic, tell us who came up with it in a comment or include some of the discussion or a human readable version of the logic, it all helps get our heads around it easier.

* Write for ease of code maintenance rather than the tightest possible code. Server time is cheap, coders time ain't. Don't make us dig around in hard to read code any more than we have to.

* Don't worry too much about showing off your coding skills by trying to compact everything into a one line wonder. Breaking it down and make it easier to read and debug will be more appreciated by those that follow you.

* Break larger functions down into smaller more targetted functions. A function should do one job well. Then you can use several functions to perform what you want knowing that they should all do their thing well, better to debug a 5 line functions that manages Capitalisation for instance than having a 100 line beast that trims, caps, html encodes, replaces, upper cases (but only on Mondays), etc. and having to step through it.

* Don't be affraid to have a function fail and bring a program to a halt. It will help with debugging. Having a function try to guess at what it should return can lead to problem further down the line as it returns a default value/object which another function is not expecting.

* Don't Repeat Yourself (DRY). Don't have multiple copies of the same function through out a project it can lead to you missing one when you have to change the code and ending up with some odd results. Move functions like this to just one place then there is only one place to edit it site wide.

* Write re-useable tests. Unit Tests as they are called are a gift from heaven. Automated tests that you can run again and again. So that when you do make a change you can run all the tests you have for your project and make sure they all pass (ie they all return an expected value, so with the Capitalisation function mentioned before you could pass it "the cat in the hat" and compare the returned result with "The Cat In The Hat".

* Have your functions return a usable string/object rather than writing the result out to screen from within the function. It makes them more usable as you can then chain functions together.

* Optimise when you need to, not ahead of time. No point spending a week honing a loop down to be lightening fast if the page its on only gets 3 hits a week and it does nothing heavy. Of course there are exceptions. Googles homepage does not use quote around its attribute although it should, but little tweaks like that to them amount to huge saving in bandwidth, an optimisation that was needed there and then.

* Clean up your mess. Don't leave lots of lines of debug code lying around the place, nothing worst than coming to a script 6 months later to find 100 lines of commented out code, whats it there for? Is it important? Should I leave it? Who knows! Clean it up.

* Use Source control. To avoid the last problem above start using a source control system (Subversion is free and easy to use and theres a book on how to use it). Then you can keep a track of all your changes rather than having to have index1.asp, index2.asp, index_backup.asp, index_old.asp lying around everywhere (come on, we've all done it). If you need to roll back to a previous version you can.

* Back it up. You know it makes sense. You can even back up your source control system.

* Test you back ups. Can you get the data you need out of a back up? If not whats the point in having them?



 2:32 pm on Sep 20, 2006 (gmt 0)

Why not link to as much code as possible. That way most of your long code of off the page and it doesn't have to be that clean. On page you have a good code to content ratio


 4:24 am on Sep 21, 2006 (gmt 0)

B seems to be the consensus, but personally I go with C.

My concession to readability is that it should be easy to maintain in the future and understandable to anyone who has taken the time to become familiar with the logic.

This means tight, but not too tight, with comments, but only where absolutely necessary.

After that it's all about getting as much information into a single screen as possible :-)

People talk a lot about code being readable for future programmers but, when I work on someone else's code I find that all the ultra verbose commenting (### Look it's a print statment!), extra tabs and double carriage returns between everything just makes maintenance take longer.

People who work on your code in the future are hopefully going to become familiar with your logic and coding style before they do maintenance, at which point having to scroll for 10 pages through code that should fit into one screen isn't making anyone's life easier.

[edited by: IanKelley at 4:25 am (utc) on Sep. 21, 2006]

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved