homepage Welcome to WebmasterWorld Guest from 54.166.111.111
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Programming language for search engines?
Perl would do?
adrianTNT




msg:4080258
 3:41 am on Feb 15, 2010 (gmt 0)

Hello.
Is Perl a good language that could be used for something similar to a search engine?
Meaning it would browse through URLs, extract data, save to database, process, sort, etc.

Perl would do that? Or are there any better solutions?
What goes google bot use , what programing language, what OS?

Thanks.

 

janharders




msg:4080362
 10:08 am on Feb 15, 2010 (gmt 0)

Perl can certainly do that, but wether it's the right tool for the job depends on your needs, that is, how fast does it have to be etc pp. Perl is, after all, a script language and does not compare to C. But that's a good thing, too, because it doesn't compare to C in development time aswell. google uses alot of python from what I've read, based on linux and possibly other unix-derivatives.

Scraping data from the web is easy in perl, datbase manipulation is easy aswell. Perl ist the Pratical Extraction and Report Language after all ... I don't know about the heavy stuff a search engine might want to do to give the best results. But: you can always embedd other languages in perl if you feel that you need to.

adrianTNT




msg:4080368
 10:56 am on Feb 15, 2010 (gmt 0)

OK. Some more "noob" questions... :)
I assume PHP would not work for something like that, and is it because it cannot run the scripts in background?!
How are Perl scripts triggered? Are they triggered by command lines in Linux? Or as web scripts like PHP?

I know some PHP but nothing about Perl or Python. I am trying to find some answers about them.

jdMorgan




msg:4080461
 1:35 pm on Feb 15, 2010 (gmt 0)

> Are they triggered by command lines in Linux? Or as web scripts like PHP?

Either or both.

Jim

adrianTNT




msg:4080501
 2:26 pm on Feb 15, 2010 (gmt 0)

And PHP doesn't work by command line, right? For example I would not be able to set a chron job to run a PHP script at a given date?!

jdMorgan




msg:4080534
 3:25 pm on Feb 15, 2010 (gmt 0)

PHP is intended for use producing HTML pages. It's important to choose the right language if you intend to build a search engine...

Learning a programming or scripting language is like learning to ride a bicycle: Once you've learned to ride one bicycle, you can easily ride a different bicycle. Maybe the gear-shifting levers work a bit differently, and maybe the bell is in a different position, but you already know how to pedal, turn, and stay balanced.

Jim

adrianTNT




msg:4080616
 5:08 pm on Feb 15, 2010 (gmt 0)

"PHP is intended for use producing HTML pages"
Yes, I use it at a medium level in sites, but I was not sure if a php script can also be triggered from command line.

I will have to look into Python. I found (as someone said above) that that is one used by Google Bot.

jdMorgan




msg:4080625
 5:33 pm on Feb 15, 2010 (gmt 0)

The reason that multiple programming and scripting languages exist is that some are better than others for specific purposes. But if you change the project goals and requirements, it is likely that "the best language" will also change. For example, while you might write your user interface (search form) using PHP, and some of your back-end using Python or PERL, the core indexing routines will likely need to be written in 'C' (or similar) if you want a high-performance, scalable solution.

You don't use a hammer to cut down a tree, or a chainsaw to install a new roof; Although these may be the best hammer and chainsaw available, each is suited to its own specific job.

If you're not looking forward to the challenge and satisfaction of learning several new and useful languages, then also consider the various open-source and paid options for ready-made search engine code.

Jim

janharders




msg:4081201
 10:52 am on Feb 16, 2010 (gmt 0)

Also, just to complete the info: you can run php from commandline with php-cli. Though, as a perl-programmer, I do not recommend it.


If you're not looking forward to the challenge and satisfaction of learning several new and useful languages, then also consider the various open-source and paid options for ready-made search engine code.


I cannot emphasize this enough. Ten years ago, I wrote a simple local search engine for our sites, because those that were (free or inexpensive) available didn't do the job right. Now, I wouldn't do that again, I'd probably go for a backend-engine that I can just throw my frontend on.

Depends on what you want to do -- if it's going to be a public engine and people are interested enough into ranking well that they start to trick your bots, you'll have to do a lot more than if it's just on the intranet or just a search engine for your website.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved