Forum Moderators: bakedjake

Message Too Old, No Replies

Grep - pro's and con's as cgi search util.

         

Brett_Tabke

9:14 am on Jun 15, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I was working on some search routines in perl. I stopped, went command line and grepped the file I was searching to see if my routine found all the matches.

As I was doing this for about the millionth time in several years, it finally hit me: why not just use grep to start with? Why go through a scripting lang like perl to search a static database file?

Obviously, some serious parsing of anything passed from the user would be required before feeding grep on the command line.

anything else? Grep is reasonably fast and portable.

bird

11:46 am on Jun 15, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Short answer: It depends... ;)

More specifically, the answer may depend on whether your data comes from a file or you already have it in memory, whether your parser for removing structural fluff (you may not want to include eg. HTML tags/attributes in your results) is a seperate program as well or a language module, and a few other factors.

I don't think that portability is an issue, as your language (Perl) is just as portable as mine (Python). And since all the tools basically use the same regex syntax, writing a simple grep module should be fairly straightforward. Such a module was even included with Python up to recently, but was relegated to deprecated status because it used an outdated regex backend. I wouldn't be surprised to find something similar in the Perl library.

Thinking about it a little more, using the external grep might actually involve unnecessary overhead in some cases, when you end up parsing its output for whatever analysis you need to do. We could probably summarize that using grep is simpler in all those cases where you don't need to pre- or postprocess the data in any way. In most other situations, I'd probably consider an inline solution first.

scotty

1:38 pm on Jun 16, 2002 (gmt 0)

10+ Year Member



Using external "grep" = Forking a new process = Expensive operation.

CGI apps are already expensive from web service point of view, whether it is in Python or Perl. Fork a second process to do a grep is a bit too much, especially for some tasks that can be easily done with /../ or re.search(..).

Brett_Tabke

2:43 pm on Jun 16, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I agree that is an issue scotty, but as bird says, it "depends" on the application. For the one I am working with in side-by-side with perl based regex'ing, cli based grep was significantly faster. What was taking 8 seconds to search (45 meg flat file), grep did in 3.

bird

3:01 pm on Jun 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Speed differences can have many reasons, and start up time of a program is only one of them. In fact, once the grep binary is loaded into disk cache, the overhead of using an external call should be negligible. There may be ways to speed up your Perl solution just as much (or more), but it could take some specialized experience to find them.

littleman

5:36 pm on Jun 16, 2002 (gmt 0)



I have used grep in cgi scripts for mining logs for specific strings. It seems faster than what perl could do on it's own.

my $grepped = `grep $options $grep_this $from_this`;

It seems much faster than using pure perl larger files.

scotty

12:29 am on Jun 17, 2002 (gmt 0)

10+ Year Member



Yeah. I agree. Since grep is coded in C, and it would be faster and has smaller memory footprint when you grep through a large file. Where as Perl/Python are intepreted...

And I am not talking about the overhead to load "/bin/grep" from disk to memory, but the overhead of "forking" a new process, which is probably one of the most expensive call in the Un*x world. You need to create a new virtual memory space, allocating heaps and stacks, linking in all the libraries (very expensive in ELF) and another context switch. But I guess since people are already lazy and using Perl/Python to do most of their work (myself included), this overhead is negligible :)