Forum Moderators: open

Message Too Old, No Replies

Using ODP Data for a search engine

         

DLord

8:04 am on Oct 25, 2003 (gmt 0)

10+ Year Member



Hello,

I wonder if there are any scripts out there that would allow me to have a search function like dmoz on our own server. I am aware that I can download the rdf dump but I am looking for some script that I could use like the search.dmoz.org site (listing sites only - not the cat or other info)

I know that there are some php scripts that get it from DMOZ directly but SMOZ search doesnt work longer for me since yesterday.

Any suggestions?

Chris_R

8:09 am on Oct 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Have you tried ODP++?
Actually - based on your other post - I think you will just have to wait until dmoz fixes their end.

DLord

8:14 am on Oct 25, 2003 (gmt 0)

10+ Year Member



Same problem since they seem to use the search.dmoz.org too

Example Search:

<snip url>

Since DMOZ search doesnt work you see 5 cats but no actual website listings

Thats why I look for some way to download the rdf feed and a script that can search it like search.dmoz.org

:(

[edited by: skibum at 2:01 am (utc) on Oct. 27, 2003]
[edit reason] no links to commercial sites please [/edit]

kctipton

2:59 pm on Oct 25, 2003 (gmt 0)

10+ Year Member



**** has some nice features for those who register (free). They parse the rdf and give results for searches (like how many times geocities.com is listed in both Yahoo and ODP). I believe they stay pretty current with their data, but I can't promise that it's only a week old or anything like that.

<added>
Uhh, I can't believe that whois dot sc is on the banned list here, but that's what I put in the **** above.

JasonHamilton

5:15 am on Nov 26, 2003 (gmt 0)

10+ Year Member



It is trivial to parse the dmoz data. I've written several programs that pull dmoz data, and insert them into RDBMs for searching.

At most, it would take about 6 lines of code to fopen the dmoz xml, and parse for the listings (url, title, desc). A few more lines if you want to limit what categories you want to remove/include.