homepage Welcome to WebmasterWorld Guest from 204.236.254.124
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Directories
Forum Library, Charter, Moderators: Webwork & skibum

Directories Forum

This 31 message thread spans 2 pages: 31 ( [1] 2 > >     
Google and dmoz lose expired domains fight
SEOPTI




msg:482892
 11:17 pm on Nov 13, 2002 (gmt 0)

I often see domains which are in different dmoz and google categories
(children, news, disabled people, church, government ..) with porn content.

Seems to me like people register expired domains which are still
in dmoz and get high PR for their sites from both, Google and dmoz categories.

It also seems to me that dmoz and google are completely unable
to handle this situation. They aren't able to delete expired
domains from their database when they expire.

This is horrible. Google should not give so musch PR value
for dmoz as long as they can't handle this.

 

bird




msg:482893
 11:37 pm on Nov 13, 2002 (gmt 0)

completely unable

Do you know what the ODP editors are doing about the problem? Do you know how many domains expire and are reregistered by someone else every day? Do you know how many expired and hijacked domains are removed from the directory every day?

The ODP is not a domain registry (neither is Google), so they simply can't remove sites from the directory automatically the same moment as they expire. But there are a number of editors who specialize in this kind of research. The standard approach is to search on Google for some of the typical phrases found on hijacked pages. This means that there will be a delay of two or three months until it is possible to remove them, but that doesn't mean they will stay there for much longer than that. If you have a list of domains that have not been detected yet, then any meta editor will be happy to look into them.

Google should not give so musch PR value for dmoz

As far as PageRank is concerned, Google treats dmoz.org exactly as any other site out there.

rfgdxm1




msg:482894
 12:05 am on Nov 14, 2002 (gmt 0)

The ODP probably doesn't have the people to deal with this. I recently became an ODP editor of a piddly bottom level cat with 9 sites listed because I was so disgusted that 3 of them LONG had been dead, and were just redirects to other sites. Robozilla never picked up on this even. Google did however, and in the Google directory those 3 sites have no PR. Obviously once a site gets in the DMOZ, even if it goes bye bye it can stay around indefinitely.

SEOPTI




msg:482895
 12:18 am on Nov 14, 2002 (gmt 0)

"This means that there will be a delay of two or three months until it is
possible to remove them"

Seems like a joke to me.

Digimon




msg:482896
 12:51 am on Nov 14, 2002 (gmt 0)

it's easy SEOPTI if you want to do something about it, help them and report the abuse cases!

victor




msg:482897
 8:56 am on Nov 14, 2002 (gmt 0)

SEOPTI:
It also seems to me that dmoz and google are completely unable to handle this situation. They aren't able to delete expired domains from their database when they expire.

This applied to any site with a link to anywhere. At any moment any of the sites I link to could expire, or could be replaced by a site promoting nigerian spam porn.

I don't check for this sort of thing regularly enough. Do you?

I do run Xenu Link Sleuth a lot -- but that only tells me broken links -- it may be a downed server rather than an expired domain. Does anyone know an equivalent tool that will run a DNS check to see if a broken link is due to an expired domain?

And, even if XLS gives a link a clean bill of health, I don't know that the link I made is still to the original content. Does anyone know a tool that will highlight substantial changes to a page, flagging it up for eyeballing?

Thanks!

angiolo




msg:482898
 10:36 am on Nov 14, 2002 (gmt 0)

To manage this situation you should have new rules, new limitations.

An example it could be some kind of flags on the "who is database" that could be managed in an automatic way by spiders, crawler, search engines etc. A flag that inform that owner changed: editors from ODP, Yahoo etc could be informed in an automatic way: they can check in seconds the content; it could be a spam filter too.

May be a flag regarding porn sites or adult sites. BUT you need some rules to punish who do not comply with that rules.

Any new rules has a new "trick".
The only way is the monitoring from surfers.

jmccormac




msg:482899
 12:07 pm on Nov 14, 2002 (gmt 0)

Actually an automated process that would check the DNS details at inclusion date and a spider that would regularly check DNS details for each inclusion is a better way than relying on surfers. I've run a program to detect incidences of a particular linkswamp operation that buys up expired domains that are listed in the Dmoz RDF. It has something like 97K domains and I found approximately 1800 hits.

Surfers could help with reporting but the magnitude of the Dmoz directory requires a proper, automated system.

Regards...jmcc

SEOPTI




msg:482900
 1:10 pm on Nov 14, 2002 (gmt 0)

jmccormac, very good suggestion, it's so easy to do it,
you don't need to be a perl or php expert. Every skilled
programer will be able to write this application.

I thought they have some paid staff at dmoz?
What are they doing all day long?

rfgdxm1




msg:482901
 1:44 pm on Nov 14, 2002 (gmt 0)

I'm not sure, but I seem to recall the ODP only has one staff member handling the software. This sort of thing may be beyond their abilities.

jmccormac




msg:482902
 2:04 pm on Nov 14, 2002 (gmt 0)

It is essentially a shell script rfgdxm1. :)

#!/bin/sh
grep -i [.]squatterdomain[.]com dmozurllist>>squatterresult

It can be done with a loop or just as a simple line for each domain.

It takes each domain from list that the cybersquatter has and then runs through the Dmoz URLs. If it gets a hit, then it writes the URL to a file. Nothing really complicated about it but it can take some time to go through all the links. I ran it on a Duron1200/128MB that was just idling here and it took about an hour or so to complete it. The main problem from the Dmoz viewpoint would be in identifying the linkswamp operators though when they have unique DNSes, they could be set up on a watchlist. Thus if a domain's DNS details are changed to a known cybersquatter's DNS, the Dmoz entry could be flagged for removal/attention by the relevant editor.

Regards...jmcc

europeforvisitors




msg:482903
 2:30 pm on Nov 14, 2002 (gmt 0)

Obviously once a site gets in the DMOZ, even if it goes bye bye it can stay around indefinitely.

Yep. I've got a long-gone page from my former (rhymes with snout-dot-com) site that's still listed in the ODP, and Google continues to rank it even though it no longer exists and is redirecting to another -----.com page. I've used the update form at least twice over the past year and e-mailed editors higher up in the ODP hierarchy (the category doesn't have an editor), but the page is still there.

Similarly, Google continues to list pages from my wife's old -----.com site more than a year after the site disappeared--again, because of redirects.

I don't know what the solution is. Banning URLs that redirect wouldn't work, because redirects are sometimes legitimate. Maybe Google should just use the ODP as a place to find new URLs for spidering and ignore the ODP's listing once a URL has been indexed. That way, the presence of an ODP listing wouldn't trick Google into thinking that a non-existent site or page was still on the Web.

bird




msg:482904
 5:52 pm on Nov 14, 2002 (gmt 0)

Actually an automated process that would check the DNS details at inclusion date and a spider that would regularly check DNS details for each inclusion is a better way than relying on surfers.

The assumption that started this thread seems to be that a check once a month would be the bare minimum. Now assume that the ODP ran a spider to hit the whois system 3 million times a month. How fast do you think netsol would block that spider?

It takes each domain from list that the cybersquatter has and then runs through the Dmoz URLs.

Yes, that's obviously the easy part of the problem. The hard part is to compile a complete list of all squatted domains every month. There's no good and efficient way to do this unless you're a domain registrar yourself. And even this ignores the fact that the ODP has domains from pretty much all country code TLDs in its database, each of which may require accessing a different whois server with a different syntax.

Many people really massively underestimate the scale of the technical challenges that an operation like the ODP poses to one lonely technical staff person. We all here could learn a hell of a lot from her. What she does within the given resource constraints is absolutely brilliant. Even if we wanted, we couldn't possibly stretch her very far beyond what she already contributes to the project.

jmccormac




msg:482905
 6:50 pm on Nov 14, 2002 (gmt 0)

bird posted:
The assumption that started this thread seems to be that a check once a month would be the bare minimum. Now assume that the ODP ran a spider to hit the whois system 3 million times a month. How fast do you think netsol would block that spider?

I think you are getting DNS/nameservers mixed up with WHOIS results bird. Checking the nameserver details and the webserver IP details of a site when it is submitted is the easy part. Any deviation could be detected easily. The rough .com count from Dmoz RDF is 1101549 and this is not that difficult to check. It would take a few hours to do it against the zonefiles.

Yes, that's obviously the easy part of the problem. The hard part is to compile a complete list of all squatted domains every month. There's no good and efficient way to do this unless you're a domain registrar yourself. And even this ignores the fact that the ODP has domains from pretty much all country code TLDs in its database, each of which may require accessing a different whois server with a different syntax.


You don't have to be a domain registrar to compile the list of squatted domains. There are easier ways but a knowledge of DNS operations and domain issues is essential. Otherwise it is all back to individual surfers detecting squatted sites and notifying Dmoz. The immediate problem is cybersquatting in the CNO tlds. These can be tackled first since they are pan-national and the easiest targets for cybersquatters.

Many people really massively underestimate the scale of the technical challenges that an operation like the ODP poses to one lonely technical staff person.

I don't underestimate the the challenges of finding domains and linkswamps nor do I underestimate what is involved with Dmoz. (I had to implement a static version earlier this year for an Irish website.) Part of the work I do is identifying which .com/.net/.org domains are registered by which country. The domain problem with Dmoz is not that complex and since you can establish a start point, there is a solution. It may not be 100% and that is where the users would come in. I don't dispute the good work that has been done but to think about domain/DNS issues does require a somewhat different approach to the one that has been used to date.

Regards...jmcc

Dynamoo




msg:482906
 2:55 pm on Nov 15, 2002 (gmt 0)

You *could* automate it. But it would cost money, and for all its importance the ODP seems to run on a shoestring.

It wouldn't be very hard (for instance) to subscribe to the expired domains lists from Exody and then cross-reference them with the ODP database, and then flag the expired domains for investigation.. after all, many of these domain name speculators are doing exactly that, just using a collection of off-the-peg tools that cost real money.

I *do* know that some of the editalls/metas hang around eBay looking for listed names for auction, so attempting to resell a domain can be a risky business.

Napoleon




msg:482907
 3:13 pm on Nov 15, 2002 (gmt 0)

Here we go again... another totally unjustified assault on the ODP.

Yes, there are problems with all search engines and sites with hijacked domains. Why on earth pick out DMOZ? The underlying complaint is valid, but the singling out of DMOZ is just ridiculous.

This happens time and time again. What's the problem for some people? I just don't get it.

jmccormac




msg:482908
 3:45 pm on Nov 15, 2002 (gmt 0)


Yes, there are problems with all search engines and sites with hijacked domains. Why on earth pick out DMOZ? The underlying complaint is valid, but the singling out of DMOZ is just ridiculous.

<rant>
The problem is link rot. Many directories wither and die because the people behind them don't pay enough attention to the problem of link rot. Dmoz, being one of the biggest directories, and a feeder to many other directories and search engines should be somewhat better. It is not for the simple reason that nobody has apparently sat down and thought about the problem. Instead it is all up to the random users who may or may not notice that a kid's website has turned into a site for hardcore porn.

This happens time and time again. What's the problem for some people? I just don't get it.

The difference is that some of us actually give a damn.:)
</rant>
Regards...jmcc

Napoleon




msg:482909
 4:01 pm on Nov 15, 2002 (gmt 0)

>> The difference is that some of us actually give a damn <<

But you obviously don't understand what I am saying.

You said it yourself... "The problem is link rot"... which afflicts every directory and site to some degree or another. DMOZ is no better or worst than any other in this respect.... hence my point.

It seems that whenever there is a general negative point to be made, DMOZ is wheeled out. Every other directory/site is conveniently overlooked.

This is simply wrong. The ODP is a volunteer organization. Editors are not paid. If a directory or site is to be singled out, I would have thought that a profit driven structure would be a somewhat more appropriate target.

In truth, no directory should be singled out. Link rot is the issue so link rot should be discussed. It should not be used as camouflage for an unjustified attack on a single directory.

rafalk




msg:482910
 4:10 pm on Nov 15, 2002 (gmt 0)

Dmoz, being one of the biggest directories, and a feeder to many other directories and search engines should be somewhat better.

Link rot at the ODP runs less than 1% - how can you expect it do be any better than that?

petertdavis




msg:482911
 4:19 pm on Nov 15, 2002 (gmt 0)

Can you back up that 1% figure with some hard evidence? IME, it's much higher than that. I have only anecdotal evidence, from small obscure categories, but from what I've seen maybe five to ten percent of links are expired domains that were bought out by pornsters and now lead to porn site, ten percent are sites no longer exisiting, or moved Tripod of Geocities type sites. The porn links are particularly disturbing.

rafalk




msg:482912
 5:32 pm on Nov 15, 2002 (gmt 0)

Can you back up that 1% figure with some hard evidence?

You have posted all sorts of fanciful conspiracy theories about DMOZ without one iota of evidence to back it up. My data, OTOH, is gleaned right from the source.

petertdavis




msg:482913
 5:44 pm on Nov 15, 2002 (gmt 0)

Randomly selecting categories.

[dmoz.org...]
11% dead links

[dmoz.org...]
11% dead links 2% porn links

Just the first two categories I looked at. Now, what was that you were saying about 1%?

Mike_Mackin




msg:482914
 6:00 pm on Nov 15, 2002 (gmt 0)

sample 2 categories out of over 460,000 categories

dmoz has always been in the 1% range overall

<side note> can't we all just get along < / side note >

bird




msg:482915
 6:04 pm on Nov 15, 2002 (gmt 0)

So you "randomly" picked two closely related categories with about 50 links, out of a directory of around 3 million links in almost half a million categories.

I have no idea about your understanding of statistics and how the significance of individual samples is determined. Rafalk was talking about the percentage of dead links over the complete directory. In a structure of that size, it is garanteed that you'll find categories like the ones you picked. However, most will look much better.

I'm actually pretty sure (without checking) that there is at least one category somewhere in the ODP that contains nothing but two or three dead links: 100%! Does that say anything about the overall quality of the directory? If you think so, please think again.

Macguru




msg:482916
 6:07 pm on Nov 15, 2002 (gmt 0)

Funny, when I ask Coppernic to validate links found on all major search engines for any given query in top 100 results, I usually get 7 % to 10 % link rot.

I guess humans do it better, at least they try.

jmccormac




msg:482917
 6:26 pm on Nov 15, 2002 (gmt 0)

You said it yourself... "The problem is link rot"... which afflicts every directory and site to some degree or another. DMOZ is no better or worst than any other in this respect.... hence my point.

Yes Napoleon but unlike every other directory, the data is freely available so that the extent of the link rot can be established and something can be done about it. This availability of data is the reason that Dmoz is always attacked and it is because the internet community has a more direct involvement that people comment more. Nobody really gives a damn about the Yahoos of the net because they are not of the community and by the community.

The ODP is a volunteer organization. Editors are not paid. If a directory or site is to be singled out, I would have thought that a profit driven structure would be a somewhat more appropriate target.

The jobs of the editors should be made as easy as possible by automation and link rot is one of the tasks that lends itself to automation. On a directory the size of Dmoz, it is futile to totally rely on users to submit link rot sites. A more active link rot detection would make Dmoz more useful and more relevant. And with this would follow more editors. (Well in theory anyway. ;) ) Unlike the other directories, the users have the power to change Dmoz and make it better.

Regards...jmcc

skibum




msg:482918
 6:36 pm on Nov 15, 2002 (gmt 0)

Maybe someone would consider creating whatever is needed to better weed out expired domains, post the code on the net, and send staff or some meta a link to it.

Jump in and help out.

Dumpy




msg:482919
 7:27 pm on Nov 15, 2002 (gmt 0)

Netscape has demonstrated it will never allocate the resources to make DMOZ better. If DMOZ was a profit center, instead of a prestige center, it would never have allowed itself to NOT be able to crawl itself to create a search database for almost two months. Netscape does not seem to care about it's loss of prestige in it's failure to make the work of it's thousands of editors available through the rdf dumps.

I can crawl DMOZ in a few hours and create a search database, but NO ONE will allow this type of volunteerism. We are all stuck with the "Systems Engineers" paid by Netscape to do it "when they can manage it".

It is time for DMOZ to be spun off as a non-profit corporation...you would then see the true power of "Open Source" and a true Open Directory Project. As long as DMOZ is captured within the corporate structure of a profit making company it is destined to fail.

It is not a proud time to be busting your butt for Netscape's Open Directory Project creating their copyrighted material.

choster




msg:482920
 7:36 pm on Nov 15, 2002 (gmt 0)

While the "official" tools depend on their implementation by ODP staff, there are a large number of sanctioned volunteer-produced tools used on a daily basis. If there are any more constructive suggestions for building a tool for checking snatched domains (as earlier in this thread), I will happily ask some of those editor-developers to try their hand.

europeforvisitors




msg:482921
 10:55 pm on Nov 15, 2002 (gmt 0)

It seems that whenever there is a general negative point to be made, DMOZ is wheeled out. Every other directory/site is conveniently overlooked.

"Every other directory/site" isn't as important as DMOZ.

This 31 message thread spans 2 pages: 31 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Directories
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved