Forum Moderators: coopster & phranque

Message Too Old, No Replies

Site Search Tools

Offline Database Generation

         

alexjc

2:40 pm on Jan 20, 2002 (gmt 0)

10+ Year Member



I've got perlfect search running as recommended in this thread:
[webmasterworld.com...]

The setup was all very nice... Just note that it will not handle subdomains unless you hack it (you need to change about 10 lines of code).

But... I have a maximum script running time on my shared server, and don't want to use up my precious bandwidth either; I have to retrieve via http due to PHP scripts.

Anyway, I installed perlfect on my off-line development windows machine, and built up the DB. Both DB_File and Berkeley DB are the exact same version as on the production server (BSD box), but it tells me the database format is not compatible. I've given up on the idea of reinstalling DB_File and Berkeley DB on the production server, so I'm looking into alternatives.

Have any of you tackled such a problem? What would you recommend?

ht://Dig doesn't seem to rely on external DB libraries, so that might work, but the size of the index gets big very quickly!

Cheers,
Alex

grnidone

7:03 pm on Jan 20, 2002 (gmt 0)



Alex, I think you've stumped us. Kicking this to the top because I think others would like to know the answer to this question.

alexjc

8:27 am on Jan 21, 2002 (gmt 0)

10+ Year Member



My mistake, ht:/Dig relies on Berkeley DB too!

More thinking, and ... I can only see two ways out of this, and neither are ideal solutions:


  • Implement a uri-to-filesystem mapping system, whereby all the web fetching is done by interpretting the local PHP script (which parses everything)... plenty to worry about there, and there's still no guarantee that it will finish in 30 seconds ;)
  • Ask for a monthly 'script-time' allowance from my ISP for my spider to run... bandwidth will suffer a bit, but hey!
  • Or combine both options, using cunning negotiations and hard work!

Hmmm...