Forum Moderators: coopster & phranque

Message Too Old, No Replies

Any Tips to Reduce PERL CPU Usage?

Does anyone have any tips on how to reduce PERL CPU usage?

         

hiker_jjw

7:03 pm on Oct 29, 2002 (gmt 0)



I have a client who is constantly getting into CPU usage issues with his ISP. I've taken several steps in order to reduce the UCP usage significantly, but this month it has creeped up on us again. I would love to hear about any of this forum's ideas!

Here's what I've done in the past:

1.) Last-Mod Headers: I've included last-modified headers for pages when appropriate.

2.) On-The-Fly Content Generation: We're using an Apache Server along with custom PERL routines which write the HTML pages (cache pages). This has been the most effective reduction so far!

In other words, the first time the page is hit the PERL script is executed which writes the HTML page. The next time the HTM pages is served up from the written cache by the Apache server. .htaccess allows us to do this.

3.) I've also resorted to using non-O.O. PERL code when possible. For example, instead of using CGI.pm we're using cgi-lib.pl. I'm still using the DBI module for our MySQL database. Is there a smaller/similar module that could be used with MySQL to reduce the CPU usage?

We're not using Mod PERL at this time. I'm not 100% sure if the ISP provides it. Actually, I'm fairly sure they (Veiro) do not.

PHP is not a good option, because we have too much custom code written in PERL. It would take months to rewrite things to PHP, yet I realize that would GREATLY improve performance.

Thanks in advance for any ideas!
Cheers

hiker_jjw

2:54 am on Oct 30, 2002 (gmt 0)



I wish I could use mod_perl.
:(

jatar_k

5:36 pm on Oct 30, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



come on, I know somone has some ideas for hiker_jjw.

jeremy goodrich

5:41 pm on Oct 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For example, instead of using CGI.pm we're using cgi-lib.pl.

CGI.pm is huge - cgi-lib.pl I'm not sure it's size, but do you *need* all the functions it offers?

It sounds like you know PERL much better than myself, however, for CGI related stuff, I tend to roll it myself in subroutines - this could help reduce the CPU, as one module will be eliminated that you currently use.

And, are you using any other modules? How compact is the code? Are there any redundancies that you can take out?

ADDED: what about regex usage? There are lots of options there, and you might be able to reduce the CPU overhead by using less regex if possible.

hiker_jjw

6:51 pm on Oct 30, 2002 (gmt 0)



Thanks for the reply Jeremy!,

CGI.pm is huge - cgi-lib.pl I'm not sure it's size, but do you *need* all the functions it offers?

cgi-lib.pl is much smaller and older than CGI.pm. It's just the basic CGI name/value pair parsing subroutine that I'm using. I could trim it up a bit. It would be worth a try, but probably wouldn't help much.

And, are you using any other modules? How compact is the code? Are there any redundancies that you can take out?

The only module I use is DBI, mainly for the MySQL database interactions. That module is only loaded if needed. I don't think are any redundancies.

ADDED: what about regex usage? There are lots of options there, and you might be able to reduce the CPU overhead by using less regex if possible.

I'm guilty on this one! I do tend to write code with many regular expressions. They're the key to keeping code flexible. I doubt I can reduce them, but it's worth a try.

What about these issues/questions:

1.) Using $¦=1; to stream output to the user. Could this be causing more CPU usage?

2.) Another thought... I have some old code that uses a lot of local() variables rather than using my(). Which is more effecient?

3.) I also ocassionally use time, gmtime, localtime, as well as rand functions. I'm sure these arn't too CPU friendly.

Then there's the question of mod_perl. My ISP has been giving me the e-mail run-around. I don't think anyone over at Support has a clue.

Cheers.

jeremy goodrich

7:10 pm on Oct 30, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure about the 3 points you made BUT DBI is also HUGE.

Perhaps you can poke through it (don't remember if it's a perl or C module) and only use the subroutines you need for your flavor of DB?

That could further reduce the code...hmmm...for the other stuff, somebody else will have to step in :)

ADDED: eg, use DBI::subroutine::sub, instead of use DBI; because the DBI::subroutine will only load in that subroutine ( I think)

hiker_jjw

7:38 pm on Oct 30, 2002 (gmt 0)



Yeah, DBI is huge! I'll have to look into using only "part" of it.

Thanks.

amoore

8:25 pm on Oct 30, 2002 (gmt 0)

10+ Year Member



It's been my experience in these types of database backed applications that high CPU load comes from pulling a bunch of data out of the database and then processing it inefficiently. I'd look into making the database do more of the work that it's good at (like sorting and limiting and such) and more efficiently processing the data you pull back. Storing your data differenly may also help you avoid some of the processing.

Without seeing what you're doing, it's really tough to optimize any of your code. I do believe, though, that you can make bigger leaps more easily in other ways than not using CGI or DBI modules.

Finally, what process is it that is actually using up the CPU? You should be able to use 'top' or 'ps' to figure this out and it will help you decide what parts of your site to optimize.

hiker_jjw

12:53 am on Oct 31, 2002 (gmt 0)



Thanks for the comments Amoore,

I'd look into making the database do more of the work that it's good at (like sorting and limiting and such) and more efficiently processing the data you pull back.

The site pulled data from a 2MB flat-file before I changed it over to use DBI. We are ORDERing and LIMITing our results. So, we are trying to take "full" advantage of the MySQL processing. The ISP is not tracking that usage, as far as I know.

Here's a typical SELECT query.

SELECT product_id, product, name, title, description, condition, price, date_added, search_keywords, thumbnail, image FROM products WHERE ((UPPER(product) REGEXP UPPER('^Art$'))) ORDER BY name LIMIT 0, 11;

I could change this to SELECT *, if you think it might help. I realy think we've got the MySQL DBI portion under control. It's not best it could be, but it works. I'm fairly certain we can not use mod_perl or Apacahe:DBI.

I do believe, though, that you can make bigger leaps more easily in other ways than not using CGI or DBI modules.

I would hope to agree, yet using CGI.pm has been shown to increase the CPU usage. I'm sure DBI causes even more CPU usage.

Finally, what process is it that is actually using up the CPU? You should be able to use 'top' or 'ps' to figure this out and it will help you decide what parts of your site to optimize.

I've never heard of these; 'top' or 'ps'. Could you direct me to some documentation/web site?

From looking at the ISP's CPU Usage log, it's one main program that is causing the CPU usage troubles. It's basically a heavily modified Web store script, based on the original Extropia.com Web store. I've practically re-written the code completely; at least - that's the way it feels. I might have to consider breaking this script up into some more component scripts.

Thanks.

jeremy goodrich

1:44 am on Oct 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



'top' and 'ps' are linux command line stuff. :) Very useful if you have telnet or ssh access to the server.

We have a forum for Linux and *nix related stuff [webmasterworld.com] right here.

Doozer

1:54 am on Oct 31, 2002 (gmt 0)

10+ Year Member



If you are having big issues with your ISP, why not consider moving?

OK, it might be a really big hassle for a couple of days while your domain gets shifted, but it may save you a lot of time and money in the long term and prevent you from wasting endless days going through 'okay' code with a fine tooth comb.

okay, probably not the most helpful post in this thread but hosting is definately a buyers market nowadays.

Brett_Tabke

2:10 am on Oct 31, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, eliminate the slow DBI routines where ever possible.

- reduce any unnecc disk access possible.
- don't use file locks if you are just doing appends (logs, etc).
- get rid of any modules you can. Hard code the module code if you have to. Major overhead associated with any USE statement.
- yes, elim CGI.pm and cgi-lib.pl. You can do the same securely in a few lines of code.
- elim handling of mass data in memory. Don't try to substitute a text into a text variable (say one change on a whole html page)
- print stuff on the fly and then print the variable parts where needed. (reduce mem usage).
- print a little stuff at the top of your routines (headers and html head stuff). Then process your data - print stuff as soon as you can - keep the net lines full while your code sneaks away to process stuff. It makes the page load seem faster for slower users.

cminblues

2:41 am on Oct 31, 2002 (gmt 0)

10+ Year Member



I agree with all the hints.

The best additional tip I can give you with these info, is:
Set the things to run the perl scripts in a 'testing area'.
[Maybe locally, on your pc].
Then, try to modify the scripts, checking the loads with 'ps' & 'top'
[if you're on a *x box. if you're on a win, use its ports :)]

[Ah, one question:
Are the perl scripts consuming also a lot of RAM?]

cminblues

hiker_jjw

2:51 am on Oct 31, 2002 (gmt 0)



Yes, eliminate the slow DBI routines where ever possible.

How? Is there an older module/library you would recommend to access the mySQL database? I'm sure I could use something more simple.

yes, elim CGI.pm and cgi-lib.pl. You can do the same securely in a few lines of code.

I'll look into rewriting cgi-lib.pl or cutting out what I don't need.

elim handling of mass data in memory. Don't try to substitute a text into a text variable (say one change on a whole html page)

I'M GUILTY, CALL THE PERL POLICE! Yeah, that's probably my main problem. I changed that earlier today, based on the regular expression comments earlier in this thread.

I was trying to reduce the file size of HTML content before I wrote the HTML to file (on-the-fly generation / Apache .htacces). I used two regular expression switches on a large HTML page.

Thanks Everyone!

Brett_Tabke

3:12 am on Oct 31, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



>I used two regular expression switches on a large HTML page.

Rule of thumb I use, is that for every regex, it takes two fold the memory. If you are regexing on 50k of data, it's going to take an addition 100k of space for every regex. That's not to bad on that size page, but when you get into repetitive regexs and larger amounts of data, it can chew through mem real fast. Cutting that stuff out and any in memory (variable) storage you can, will feel like a new box.

By switching this software from a system where the page was constructed in full before printing, to print-it-on-the-fly, I reduced mem usage per page view from a max of 6 meg to less than a half meg.

>dbi

Only use it when you have too. It's been my experience that 50% of dbi usage can be eliminated or replaced with simple flat files that are soooo much faster. I know one major bbs that goes to disk for dbi usage 22 times per page view.

cminblues

4:55 am on Oct 31, 2002 (gmt 0)

10+ Year Member



>>Don't try to substitute a text into a text variable<<

Yup, exactly.. if you're on a server, be careful with variables.. >:)

i.ex., instead of:


open(R, '1.txt');
while(<R>) {
$tot .= $_;
}
close R;
#-- do some stuff with $tot

or, worse:


open(R, '1.txt');
@tot = <R>;
close R;
#-- do some stuff with @tot

Much better:


open(R, '1.txt');
while(<R>) {
my $tmp = $_;
#-- do some stuff with $tmp
$tot .= $tmp;
}
close R;
#-- now you've $tot correct


cminblues

hiker_jjw

5:19 am on Oct 31, 2002 (gmt 0)



I'm not sure I fully understand that last post. Is this because my() creates a Lexical variable which is then dumped from memory after the while{} loop is completed?

Would something like this also be ok?


my $tot = ();
open(R, '1.txt');
while(<R>) {
. . . my $tmp = $_;
. . . push (@tot, $tmp);
}
close R;
# - do something @tot

Brett_Tabke

5:25 am on Oct 31, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month


That's just a slower way of doing this:

----
open(R, '1.txt');
@tot = <R>;
close R;
----

Ya, it is because it is a locally scoped variable with MY, that falls out of context when the routine is exited. It also doesn't use the ton of memory that the @tot=<> does. Using the whole file slurp method (@tot=<file>), loads the entire file into the array. Whereas using the second example works with a scaler and uses only the memory it needs.

It still depends upon your application. The whole goal is to reduce the number of variables you use. In list context, that means reducing the size of arrays and only manipulating those parts you need. Then dump them as fast as you can to reclaim the memory where-ever possible.

hiker_jjw

5:43 am on Oct 31, 2002 (gmt 0)



So, assume I have a bunch of subroutines that were written years ago with local() statements throughout them. Should I go ahead and rewrite them using my() so the variable(s) will be "dumped", instead of being saved as they are now?

hiker_jjw

7:25 pm on Oct 31, 2002 (gmt 0)



Ok, now I see your point 'cminblues':

Simply put, do the processing within the While Loop instead of storing all those values in an array and then processing it with a foreach loop after the while loop is executed.

In some cases, because the way my code is written, it's not going to be possible. In other situations, I will strive to keep all these considerations in mind.

Thanks to Everyone for Replying and to WebmasterWorld for this Forum!

amoore

3:46 am on Nov 4, 2002 (gmt 0)

10+ Year Member



Jumping back a bit:
WHERE ((UPPER(product) REGEXP UPPER('^Art$')))

is that the same as:
WHERE UPPER(product) like '%ART'?
and also, is that easier on the database? Just a thought.

hiker_jjw

6:37 pm on Nov 4, 2002 (gmt 0)



That's a great point amoore. The logic for building the SQL query has gone through some changes over the past year. Now that we're doing category specific searches, it seems we could simplify this even more.

WHERE product LIKE 'Art'

Case Insensitive for normal (NOT Binary) string types.

sun818

8:04 pm on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



LIKE will perform a table scan which you should avoid if possible.

Creating indexes for PRODUCT and NAME fields may help reduce CPU usage.

hiker_jjw

9:15 pm on Nov 4, 2002 (gmt 0)



Creating indexes for PRODUCT and NAME fields may help reduce CPU usage.

'product' is currently indexed.

It's more of a PERL CPU issue that I have with the current ISP. I don't believe they are tracking the MySQL usage. They do consider the MySQL files as part of your total allowed space, but that not an issue.

My problem is the CPU usage. I am slowly re-writting the PERL script(s) based on the comments made in this thread; especially the REG EXP comments...

Thanks!

hiker_jjw

1:47 am on Nov 11, 2002 (gmt 0)



Not to beat a dead thread with a stick, but I finally got a chance to run a benchmark test; use Benchmark;

The results with "use DBI" module (10 products displayed) was:

1 wallclock secs ( 0.38 usr + 0.05 sys = 0.43 CPU) - sometimes a little more or less.

Without "use DBI" the benchmarking was, of course, much better. For example, to delete an item from a shopping cart took:

0 wallclock secs ( 0.16 usr + 0.01 sys = 0.17 CPU)

Now the confusing part is how the ISP measures this CPU usage; * 100 cputicks = 1 cpu second. According to the ISP the script in question is averaging about 0.85 cpu seconds per execution.

I guess they are measuring something in addition to the PERL script? Maybe they're measuring the MySQL usage too? Do these numbers sound excessive?

amoore

9:01 pm on Dec 16, 2002 (gmt 0)

10+ Year Member



Old thread, but this resource seems quite helpful to people in this situation:
[ccl4.org...]
Hope it helps someone.

-Andy

hiker_jjw

12:07 am on Dec 17, 2002 (gmt 0)



amoore,

Thanks a bunch for that url! The information helped a lot, especially the info on using Devel::DProf and the differences between s// and tr//.

Thank Again!