Forum Moderators: open
Seems to me like people register expired domains which are still
in dmoz and get high PR for their sites from both, Google and dmoz categories.
It also seems to me that dmoz and google are completely unable
to handle this situation. They aren't able to delete expired
domains from their database when they expire.
This is horrible. Google should not give so musch PR value
for dmoz as long as they can't handle this.
Do you know what the ODP editors are doing about the problem? Do you know how many domains expire and are reregistered by someone else every day? Do you know how many expired and hijacked domains are removed from the directory every day?
The ODP is not a domain registry (neither is Google), so they simply can't remove sites from the directory automatically the same moment as they expire. But there are a number of editors who specialize in this kind of research. The standard approach is to search on Google for some of the typical phrases found on hijacked pages. This means that there will be a delay of two or three months until it is possible to remove them, but that doesn't mean they will stay there for much longer than that. If you have a list of domains that have not been detected yet, then any meta editor will be happy to look into them.
Google should not give so musch PR value for dmoz
As far as PageRank is concerned, Google treats dmoz.org exactly as any other site out there.
This applied to any site with a link to anywhere. At any moment any of the sites I link to could expire, or could be replaced by a site promoting nigerian spam porn.
I don't check for this sort of thing regularly enough. Do you?
I do run Xenu Link Sleuth a lot -- but that only tells me broken links -- it may be a downed server rather than an expired domain. Does anyone know an equivalent tool that will run a DNS check to see if a broken link is due to an expired domain?
And, even if XLS gives a link a clean bill of health, I don't know that the link I made is still to the original content. Does anyone know a tool that will highlight substantial changes to a page, flagging it up for eyeballing?
Thanks!
An example it could be some kind of flags on the "who is database" that could be managed in an automatic way by spiders, crawler, search engines etc. A flag that inform that owner changed: editors from ODP, Yahoo etc could be informed in an automatic way: they can check in seconds the content; it could be a spam filter too.
May be a flag regarding porn sites or adult sites. BUT you need some rules to punish who do not comply with that rules.
Any new rules has a new "trick".
The only way is the monitoring from surfers.
Surfers could help with reporting but the magnitude of the Dmoz directory requires a proper, automated system.
Regards...jmcc
#!/bin/sh
grep -i [.]squatterdomain[.]com dmozurllist>>squatterresult
It can be done with a loop or just as a simple line for each domain.
It takes each domain from list that the cybersquatter has and then runs through the Dmoz URLs. If it gets a hit, then it writes the URL to a file. Nothing really complicated about it but it can take some time to go through all the links. I ran it on a Duron1200/128MB that was just idling here and it took about an hour or so to complete it. The main problem from the Dmoz viewpoint would be in identifying the linkswamp operators though when they have unique DNSes, they could be set up on a watchlist. Thus if a domain's DNS details are changed to a known cybersquatter's DNS, the Dmoz entry could be flagged for removal/attention by the relevant editor.
Regards...jmcc
Obviously once a site gets in the DMOZ, even if it goes bye bye it can stay around indefinitely.
Yep. I've got a long-gone page from my former (rhymes with snout-dot-com) site that's still listed in the ODP, and Google continues to rank it even though it no longer exists and is redirecting to another -----.com page. I've used the update form at least twice over the past year and e-mailed editors higher up in the ODP hierarchy (the category doesn't have an editor), but the page is still there.
Similarly, Google continues to list pages from my wife's old -----.com site more than a year after the site disappeared--again, because of redirects.
I don't know what the solution is. Banning URLs that redirect wouldn't work, because redirects are sometimes legitimate. Maybe Google should just use the ODP as a place to find new URLs for spidering and ignore the ODP's listing once a URL has been indexed. That way, the presence of an ODP listing wouldn't trick Google into thinking that a non-existent site or page was still on the Web.
The assumption that started this thread seems to be that a check once a month would be the bare minimum. Now assume that the ODP ran a spider to hit the whois system 3 million times a month. How fast do you think netsol would block that spider?
It takes each domain from list that the cybersquatter has and then runs through the Dmoz URLs.
Yes, that's obviously the easy part of the problem. The hard part is to compile a complete list of all squatted domains every month. There's no good and efficient way to do this unless you're a domain registrar yourself. And even this ignores the fact that the ODP has domains from pretty much all country code TLDs in its database, each of which may require accessing a different whois server with a different syntax.
Many people really massively underestimate the scale of the technical challenges that an operation like the ODP poses to one lonely technical staff person. We all here could learn a hell of a lot from her. What she does within the given resource constraints is absolutely brilliant. Even if we wanted, we couldn't possibly stretch her very far beyond what she already contributes to the project.
The assumption that started this thread seems to be that a check once a month would be the bare minimum. Now assume that the ODP ran a spider to hit the whois system 3 million times a month. How fast do you think netsol would block that spider?
I think you are getting DNS/nameservers mixed up with WHOIS results bird. Checking the nameserver details and the webserver IP details of a site when it is submitted is the easy part. Any deviation could be detected easily. The rough .com count from Dmoz RDF is 1101549 and this is not that difficult to check. It would take a few hours to do it against the zonefiles.
Yes, that's obviously the easy part of the problem. The hard part is to compile a complete list of all squatted domains every month. There's no good and efficient way to do this unless you're a domain registrar yourself. And even this ignores the fact that the ODP has domains from pretty much all country code TLDs in its database, each of which may require accessing a different whois server with a different syntax.
Many people really massively underestimate the scale of the technical challenges that an operation like the ODP poses to one lonely technical staff person.
Regards...jmcc
It wouldn't be very hard (for instance) to subscribe to the expired domains lists from Exody and then cross-reference them with the ODP database, and then flag the expired domains for investigation.. after all, many of these domain name speculators are doing exactly that, just using a collection of off-the-peg tools that cost real money.
I *do* know that some of the editalls/metas hang around eBay looking for listed names for auction, so attempting to resell a domain can be a risky business.
Yes, there are problems with all search engines and sites with hijacked domains. Why on earth pick out DMOZ? The underlying complaint is valid, but the singling out of DMOZ is just ridiculous.
This happens time and time again. What's the problem for some people? I just don't get it.
Yes, there are problems with all search engines and sites with hijacked domains. Why on earth pick out DMOZ? The underlying complaint is valid, but the singling out of DMOZ is just ridiculous.
This happens time and time again. What's the problem for some people? I just don't get it.
The difference is that some of us actually give a damn.:)
</rant>
Regards...jmcc
But you obviously don't understand what I am saying.
You said it yourself... "The problem is link rot"... which afflicts every directory and site to some degree or another. DMOZ is no better or worst than any other in this respect.... hence my point.
It seems that whenever there is a general negative point to be made, DMOZ is wheeled out. Every other directory/site is conveniently overlooked.
This is simply wrong. The ODP is a volunteer organization. Editors are not paid. If a directory or site is to be singled out, I would have thought that a profit driven structure would be a somewhat more appropriate target.
In truth, no directory should be singled out. Link rot is the issue so link rot should be discussed. It should not be used as camouflage for an unjustified attack on a single directory.
I have no idea about your understanding of statistics and how the significance of individual samples is determined. Rafalk was talking about the percentage of dead links over the complete directory. In a structure of that size, it is garanteed that you'll find categories like the ones you picked. However, most will look much better.
I'm actually pretty sure (without checking) that there is at least one category somewhere in the ODP that contains nothing but two or three dead links: 100%! Does that say anything about the overall quality of the directory? If you think so, please think again.
You said it yourself... "The problem is link rot"... which afflicts every directory and site to some degree or another. DMOZ is no better or worst than any other in this respect.... hence my point.
Yes Napoleon but unlike every other directory, the data is freely available so that the extent of the link rot can be established and something can be done about it. This availability of data is the reason that Dmoz is always attacked and it is because the internet community has a more direct involvement that people comment more. Nobody really gives a damn about the Yahoos of the net because they are not of the community and by the community.
The ODP is a volunteer organization. Editors are not paid. If a directory or site is to be singled out, I would have thought that a profit driven structure would be a somewhat more appropriate target.
The jobs of the editors should be made as easy as possible by automation and link rot is one of the tasks that lends itself to automation. On a directory the size of Dmoz, it is futile to totally rely on users to submit link rot sites. A more active link rot detection would make Dmoz more useful and more relevant. And with this would follow more editors. (Well in theory anyway. ;) ) Unlike the other directories, the users have the power to change Dmoz and make it better.
Regards...jmcc
I can crawl DMOZ in a few hours and create a search database, but NO ONE will allow this type of volunteerism. We are all stuck with the "Systems Engineers" paid by Netscape to do it "when they can manage it".
It is time for DMOZ to be spun off as a non-profit corporation...you would then see the true power of "Open Source" and a true Open Directory Project. As long as DMOZ is captured within the corporate structure of a profit making company it is destined to fail.
It is not a proud time to be busting your butt for Netscape's Open Directory Project creating their copyrighted material.
It seems that whenever there is a general negative point to be made, DMOZ is wheeled out. Every other directory/site is conveniently overlooked.
"Every other directory/site" isn't as important as DMOZ.