Bookmark Search Engine

Forum Moderators: coopster & phranque

Message Too Old, No Replies

Bookmark Search Engine

NFFC

7:53 pm on Dec 8, 2002 (gmt 0)

It's not often we post URL's to software products but here is an exception. It is a great utility, free and written by one of WebmasterWorld's most respected members. The developer is looking for feedback and can be contacted via the URL below.

It is a server/bot/bookmark manager, it is GPL'ed, free, and a work in progress.
It works with Netscape, Mozilla, and Opera, Galeon, and should work with Konqueror.

[collectivemind.sourceforge.net...]

GilbertZ

1:59 pm on Dec 9, 2002 (gmt 0)

Looks interesting...not quite sure how it works though..is the idea to have a sever based search engine that spiders your bookrmarks and anyone else who offers theirs and become a bookmark search engine for the public...or is it (more likely) something you can install on your server, offer your bookmarks, a search engine is built and you can search the urls in your own bookmarks for replies to your search query?

littleman

5:54 pm on Dec 9, 2002 (gmt 0)

GilbertZ,
Don't be put off by the word server, it is a script that runs on your desktop. Once you fire up the script you would point your browser to [127.0.0.1:6800...] , that is where the script is broadcasting from with the default settings. It is a 'server' because it works via an HTML interface without being a CGI application.

I am currently working on the code to get these scripts to hook up with each other in a p2p network, but that isn't in this version. Just think of it as a self contained search engine which spiders your bookmarks and sits on your desktop.

andreasfriedrich

6:08 pm on Dec 9, 2002 (gmt 0)

I just downloaded both *nix and Windows version. Neither seems to be able to parse IE�s bookmarks. :(

littleman

6:23 pm on Dec 9, 2002 (gmt 0)

No, sorry. It doesn't work with IE -- not yet anyway.

IE does bookmarks in an odd kind of way -- each bookmark is saved as an individual file. What you could do is grab one of those free bookmark converting utilities and save the IE bookmarks in either Mozilla, or Opera format.

GilbertZ

7:03 pm on Dec 9, 2002 (gmt 0)

OK, that's cool. I can't think of anything I would use it for personally, but its not a bad idea at all..especially if you have a lot of bookmarks and forgot where you read something.

andreasfriedrich

7:34 pm on Dec 9, 2002 (gmt 0)

That�s exactly what I did. The server is currently indexing the bookmarks. It take a very long time indeed ;)

littleman

8:35 pm on Dec 9, 2002 (gmt 0)

Key word analysis actually happens during the crawl cycle, that takes a while. The up side is that the search results are pretty snappy.

andreasfriedrich

8:45 pm on Dec 9, 2002 (gmt 0)

It does work now. Reaction time is fast.

I have this code in one of my bookmarked pages:

We&#173;ge nach K&#246;then

When I search for k�then I get nothing. If I change the ö to � and let the server index the bookmarks again it works.

It would be great if you could ignore the soft hyphen.

Andreas

littleman

11:06 pm on Dec 9, 2002 (gmt 0)

I'll look into a compact integration for UTF-8.

littleman

11:03 am on Dec 10, 2002 (gmt 0)

Andreas,
It does UTF-8 to character converting now. You'll have to rebuild your database.

andreasfriedrich

2:17 pm on Dec 10, 2002 (gmt 0)

I rebuild the database and it still does not work. How come?

>>Did you download the new version?

No. You said I just had to rebuild the index.

>>But how did you expect to get the new version without downloading?

I didn�t expect anything except for it to work. I don�t know how you changed it. It�s your code so you should know.

>>Oh I do know and I did the changes. All you need to do is download the new version.

Which new version?

>>The one with UTF-8 support.

What support?

Thanks littleman. I�ll try it later today.

Andreas

andreasfriedrich

3:54 pm on Dec 11, 2002 (gmt 0)

I did try the new UTF-8 version. It did index the bookmarks but still didn�t find them.

After making sure that the search terms were encoded likewise it did work. However, when displaying the results I got the usual UTF-8 garbage: köthen instead of k�then.

*** bookmark-server-linux.pl  Wed Dec 11 16:46:48 2002 
--- bmsaf.pl  Wed Dec 11 16:41:56 2002 
*************** 
*** 117,123 **** 
   $¦ = 1; 
   print $client "HTTP/1.0 200 OK\r\n"; 
   print $client "Connection: close\r\n"; 
!   print $client "Content-type: text/html\r\n\r\n"; 
#  
   ##print the front page 
#  
--- 117,123 ---- 
   $¦ = 1; 
   print $client "HTTP/1.0 200 OK\r\n"; 
   print $client "Connection: close\r\n"; 
!   print $client "Content-type: text/html; charset:UTF-8\r\n\r\n"; 
#  
   ##print the front page 
#  
*************** 
*** 159,164 **** 
--- 159,167 ---- 
#  
     my $search = $formdata{'search'}; 
#  
+ $search =~s/([\x80-\xFF])/widechar(ord($1))/ge; 
+ warn $search, "\n"; 
+  
     my @word = split ( / /, $search ); 
     print $client '&nbsp;&nbsp;'; 
#

I was under the impression that Unicode support was better in Perl 5.8 but haven�t tried so far.

Andreas

littleman

5:11 pm on Dec 11, 2002 (gmt 0)

Thanks for that, Andreas. Being an English speaker I'm working in new territories, I believed that the browser would automatically sort that the text was UTF-8 and funnel it to the appropriate language set via the browser preferences.

Though possibly redundant, I am also adding a charset=UTF-8 meta tag.

andreasfriedrich

5:38 pm on Dec 11, 2002 (gmt 0)

littleman, I�m not really convinced that UTF-8 is the way to go, at least not externally. For my CMS I stick with numeric html entities. This works across browsers and shows the correct glyphs.

Internally my CMS (written in Perl) uses a mean and stupid hack. I make sure that every character that is outside of the ASCII range is converted to its numeric entity. Then those entities are converted to look like &#246; i.e. I escape the ampersand again. This prevents any of the modules, xml parser, etc. to recognize the entities and mess around with them. Some of those wanted to be really smart and use UTF-8 since that is supposedly the computer science savy way to go. Now all they do is mess around with the ampersand. Right before my CMS outputs the PHP files it converts the escaped ampersand back to a real one and I get back all the numeric entities.

I do not really like this approach, but it has a great advantage over all the others I tried. It acually works. And I figured that internally I may legally use every representation I want as long as I adhere to standards when something leaves my CMS.

Andreas