Forum Moderators: bakedjake
As I was doing this for about the millionth time in several years, it finally hit me: why not just use grep to start with? Why go through a scripting lang like perl to search a static database file?
Obviously, some serious parsing of anything passed from the user would be required before feeding grep on the command line.
anything else? Grep is reasonably fast and portable.
More specifically, the answer may depend on whether your data comes from a file or you already have it in memory, whether your parser for removing structural fluff (you may not want to include eg. HTML tags/attributes in your results) is a seperate program as well or a language module, and a few other factors.
I don't think that portability is an issue, as your language (Perl) is just as portable as mine (Python). And since all the tools basically use the same regex syntax, writing a simple grep module should be fairly straightforward. Such a module was even included with Python up to recently, but was relegated to deprecated status because it used an outdated regex backend. I wouldn't be surprised to find something similar in the Perl library.
Thinking about it a little more, using the external grep might actually involve unnecessary overhead in some cases, when you end up parsing its output for whatever analysis you need to do. We could probably summarize that using grep is simpler in all those cases where you don't need to pre- or postprocess the data in any way. In most other situations, I'd probably consider an inline solution first.
my $grepped = `grep $options $grep_this $from_this`;
It seems much faster than using pure perl larger files.
And I am not talking about the overhead to load "/bin/grep" from disk to memory, but the overhead of "forking" a new process, which is probably one of the most expensive call in the Un*x world. You need to create a new virtual memory space, allocating heaps and stacks, linking in all the libraries (very expensive in ELF) and another context switch. But I guess since people are already lazy and using Perl/Python to do most of their work (myself included), this overhead is negligible :)