Forum Moderators: open
- are they getting updated?
- can they get accessed? (they're rather big, and if the main site does not work well, then how about these feeds)
- is it stable?
- other preferred methods (eg. "scraping" a well-working other mirror - an official one of course)?
Does anyone have experience, advice, or comments?
/claus
Thats an old one: For security reasons, Netscape had their servers non-pingable before the server upgrade. I didn't check them all but assume they are still behind the firewall and for that reason are still not accessible with a ping.
However, i could still use a little more feedback - any problems or issues to be aware of?
Any SE issues? Duplicate penalties, or whatever?
Is everything just wonderful, running such a mirror?
/claus
I guess my original post was too specific so it has been removed, sorry mods.
the most important bits of the directory are working perfectly - that's the editor side where thousands of sites are being listed every day
I was wondering... could they move to a 'distributed' kind of network. You know, I've seen this somewhere else, like NASA or something. Where you agree to donate some of your computers unused processor time to help them with the processing. Anyone know what I mean? (this is just a thought, I have NO idea if its possible)
--Kenn
NOTE: Sorry if that was off topic. As mentioned I have no idea if its possible... and with that in mind, I thought it was similar to mirroring but in a different way.
[edited by: kwasher at 4:57 pm (utc) on Sep. 28, 2003]
As they encourage mirroring (and it is a good idea) they reduce some of the load on their own frontend servers and make the directory available to others at the same time. It's not the same as you suggest, but please, please, keep this thread on the topic of mirroring the ODP.
/claus
1) The central server farm was just made a bit more distributed - only that servers are placed in the same building :-) But there are several public servers right now (as was posted in lots of fora). As you surely can imagine (okay, at least I can do that for some reasons :) ) changing from a single to several servers is quite difficult. Lack of internal bandwidth, thinking of the best way to upgrade your mirrors, ...
2) Apart form some datausers like Google, there are some official mirrors in place: ch.dmoz.org and de.dmoz.org (a thirds one was in preparation, I think it was in india but I don't remember exactly. Server upgrade stopped this I suppose). Only disadvantage: While the servers were upgraded (see chapter 1) the synchronisation process was stopüped for some reason. I was away for the weekend, but until friday it has not been restarted. In my eyes it is quite reasonable to say that the internal problems should be dealt with first, while the external servers can wait. They are still working, only are "a bit" out of date.
As i'm interested in mirroring no more than a small number of cats (between 10-50 out of 460,000), this process has so much overhead that it will be very much easier to just mirror the whole dmoz in stead. Then again, this will take an awful lot of server space (i have not yet been able to estimate exactly how much, as those files are huge. At least a few gigabytes i reckon)
A full dmoz mirror is not an option at present. Partly because of lack of sufficient disk space, and partly (more important) because it would not be relevant for me to mirror all cats. It would be quite easy to just proxy the whole dmoz.org site, but that is not the same as mirroring, it is not what i want to do, and there's no real user benefit in this, neither for me nor for dmoz.
So, at present i'm stuck. There's some links in the subcats of "Directories/Open_Directory_Project/" that i will investigate and then i'll either forget the idea completely or move on with it - others must have been there before i think.
Still, nobody want to (or can) share experiences?
That's a bit worrying. Surely, i can't be the only one in here having thought about this?
/claus
Basic License. Netscape grants you a non-exclusive, royalty-free license to use, reproduce, modify and create derivative works from, and distribute and publish the Open Directory (...)
It doesn't say you have to use the RDF dump.
And yes the public servers are erratic, to say the least, at this time. But since you'd only want to re-fetch the pages every so often, you could use the mirrors, which are just about as fresh as the public pages and a little bit fresher than the RDF (or at least should be).
As to just grabbing the pages, i mentioned the possibility in my first post, and it would be the easiest by far. Normally this is not a thing that webmasters welcome, so i thought i would look carefully for explicit mention of this in the TOS. I agree that it seems to be covered under the general terms "use, reproduce, modify and create derivative works from", so it seems as if you just make sure you have those attributions, there will be no problems.
I think i have to modify them as well, though. The suggested (required verbatim?) HTML markup is not very nice as i prefer CSS (does it even validate?) and that green color suits only one site and that's the one it's on, but that's details. At least i hope so.
Now, SE effects:
Has anyone got experience with this (kind of) mirroring being considered duplicate content? Penalties? After all, it is a (partial) copy of another site, even if allowed and encouraged. Opinions?
/claus