Forum Moderators: open
On my server all requests for www.example.com/dir are returning a 301 and are then redirected to www.example.com/dir/. Now I notice in my web log that this particular spider requested /dir, but did not follow the redirect. It did not wait for the status 200 page, but quitted.
My question is, could this result in my site being automatically dropped from DMOZ, or will the validity of the link be manually checked before excluding?
Assuming that our robozilla did decide that a listed site was unavailable, it would be flagged "red", not deleted. An editor will eventually manually check red links. I'm not sure, but I think if a red from last month has not yet been handled manually, a 2nd failure this month would have the site move to our unreviewed queue. While not the same as auto-delete, this would cause the site to disappear publicly until someone has a chance to check.
-- Rich
We have our own link checker, which identifies itself as "robozilla". And there are some editor-written link checkers, which editors can run on a particular category.
But there is no third-party spider that we use. And a 301 redirect won't cause a site to be removed; it will cause the link to be updated.
They are probably using their own local copy of the RDF file, which is freely available at [rdf.dmoz.org...] as well as older copies in the /archive folder.
But reported to whom? from whom? and how did you intercept is? Exactly what did it say? -- perhaps you misinterpreted it.
Without knowing the details, almost anything that can be said (except the fact that the ODP doesn't use any such thing) is wild-eyed tinfoil-hat mouth-frothing fantasy.
Sorry for being a "wild-eyed tinfoil-hat mouth-frothing" dreamer, but if everybody would know everything there would be no need for this forum. I think it is better that people are encouraged to ask instead of continuing to have fantasies. I am grateful to Rich Franzen who got my post exactly right and gave a complete and open answer. Now nobody needs to be uninformed about how OPD works.
My own msg #8 contains the explanation of the spider. There is no need to continue the discussion. But it could be added that a research spider programme that omits the end slash, which is included in the link it uses as source, and therefore returns an unwarranted 301 to the statistics it collects, obviously has a flaw. In any case this is not a true 301.
If it was a third-party company spidering your site then I fail to see why the ODP or ODP editors are involved in anyway whatsoever at all. This is half a story, using unconnected factors, speculation and guesswork, and undoubtably leads to a wrong conclusion.
Sounds like a non-event to me.
I said in my msg #8 that it is obvious that this spider collects data for statistical purposes of e.g. how many requests, to servers worldwide, return a 301. mbauser2 has not read this my msg. Of course this spider is programmed not to follow 301's, as it just needs to count the number of 301 responses it gets (and 200's etc). But I did not know all that when I started this thread.
However, it does look like this spider programme shortens links coded as ending with "index.html", e.g. "/dir/index.html", to just /dir, thus creating itself a 301 status. This can be considered a flaw in a research programme, as it causes a bias in the statistics the company prepares. I argued additionally that a redirect caused by a missing trailing slash should in no case be counted as a true redirect. Nearly all servers automatically redirect requests for page /dir to page /dir/ even if there is no mod_rewrite.
The last has, of course, nothing to do with OPD, I simply found it noteworthy. The above "flaw" caused me to post my question in the first instance. Although my posting later proved to be unnecessary, I am disappointed over the discouraging treatment I received here from one fellow member. I find it useless to continue this discussion.