Forum Moderators: open
I am becoming increasingly frustrated by webmasters getting sites into the ODP under various topics having written a nice looking and directory qualifying site.
Whats wrong with this you might think?
Without giving the sites away, they have their original content overwritten and are now ULTRA HARD ** - and I mean ***.
They are in categories ranging from sportswear to weather channels which are being spammed out in this manner by adult content webmasters.
They all rank well on google, thanks to their dmoz entries and a quick recode for a different phrase.
I have NINE examples of these now for the same or related phrase.
Well, given that I am a competitor - I have only three options.
1. Give up competing.
2. Spam the directory as I am more than capable of organising simple uploads to overwrite my original sites.
3. Get the offenders out.
Two sites have been dropped by Dmoz as I have reported them, however they are still in Google two DANCES LATER!
If google is unable to filter the dmoz - then this will get worse, and unless they do something about it before the next dance - then I can assure them that myself and my teams from around the world will systematically and with relentless vigor, fill the Dmoz with absolute dross and filth, take the money and run, I would prefer however to follow route 3 - as I do have some morals when it comes to an 11 year old girl looking at sportswear and finding red adult material (which is how I discovered what was going on and how it works.)
Theres an old phrase you see - if you cant beat em...............
and my addition to it is - and by the god, for every single site my competitors do - I can do 100.
Righty ho - got it off my chest now.
Any ideas how to get these mental spammers out?
Fiver.
[edited by: Brett_Tabke at 7:19 pm (utc) on Dec. 2, 2002]
[edit reason] please, leave adult site references out -thanks. [/edit]
Shak
I can tell you that locating and removing snatched domains has been elevated to priority status at ODP for some time since the dot-com collapse (I cannot say whether the same is true over at Yhoo). If you wish to report a bad url to the local editor, there is an "Update URL" function on all dmoz.org pages with listed sites.
Does anyone in here know of anyone in any kind of helpful authority in here?
This way I can compete normally - provide lots of content and good websites without having to step it up a gear and spam the directory out.
Any help appreciated.
Cheers
Fiver.
Thanks,
Re reporting - I have had two removed from Dmoz - indeed credit to where it is due, the editors removed the sites from the category which each edits with 12 hours!
However - it appears that google doesnt seem to bother checking whether dmoz sites are still in dmoz during its dance or during everflux either - and here lies the problem.
In some industries, this would be worth over $100,000 per month to me - so you can see how infuriated I am and how tempted I am at joining in. However I would rather not - not even for that money.
Fiver.
If its a matter of $100k a month, I can guarantee you that I along with 99% members would be doing whatever it takes.
I suppose the lesson to learn is:
1, Do NOT rely purely on Google.
2, Do venture into PPC, off line, banner ads etc.
3, Do what you have to DO :)
Shak
There are some editors that manage very large category space, and those cats have a huge number of listings in them. I'm a DMOZ editor of a very small cat, and this would never happen. However, I can see why editors in charge of LOTS of listings could take a while before they spot this sort of thing. The problem: the DMOZ is run by volunteers. How much can be expected from them? The only way around this would be for someone, like Google, to pay to have humans manually check for this sort of thing. The problem is more the cheaters than the DMOZ editors.
Reporting sites to Dmoz is easy and successful in my experience. Therefore - the following stands true.
1. Google deletes all Dmoz entries from its index.
2. Google reads the Dmoz and creates a list of live Dmoz sites.
3. Google re-indexes the Dmoz.
(CPU and Bandwidth heavy)
1. Dmoz provides google guys with a monthly file of straight URLS which their directory contains.
2. Google makes a simple sequential pass through this file, checking to see whether or not it is in its index.
If not, google indexes it in the normal way.
If so, google re-calculates for it in the normal way.
3. Google makes a simple index sequential read by "in dmoz? flag" through its index isolating its own list of dmoz entries.
Google searches the Dmoz provided file for the current URL from the current record.
If it is in the Dmoz provided file - then its still in dmoz, if not, then Dmoz editors have removed it - so google should remove it.
Simple.
If you need any help Dmoz or Google, I'll do it for you.
So - step 1 Dmoz editor deletes.
Step 2 Dmoz give google file
Step 3 Google parse file and readjust index.
Whats the problem?
Fiver.
Whats the RDF? - is it the feed to G?
>The RDF has not been update for "a long time" but that is another story.
I know for a fact that Google spiders the DMOZ, and uses *that* as its basis for rankings. Thus, if a site is dropped from the DMOZ, while still in the Google directory, it will get no PR from the DMOZ.
If its dropped from dmoz - why shouldnt google drop it - given that the only reason it gets dropped is for spam or is down. I cannot think of any reason on earth why a site dumped from Dmoz should not be dumped from Google.
Seems straight forward to me.
I know the editors are busy and all that.
But by reporting to "staff" as well, there is no real problem with getting the crud out of there.
Its just googles retention of it which is the pain.
Fiver.
anyway, no theres nothing wrong with dmoz passing google a flat file, but theres nothing right with it either. the flat file wouldnt contain relevant information to google, other than include/exclude.
that's taken care of by the RDF dump. so google's stance is likely, why would we care.
How do you work that out?
A flat file or relational database can contain absolutely any bit of information required by the recieving system - i.e. google.
Why would they care?
Because they put 11 year old kids by the thousand infront of hard core porn under the guise of kids sites and weather and sport - thats why.
Fiver.
Fiver, I know for a fact after seeing proof with the case of a specific site that Google *does* spider the DMOZ, and uses that to calculate PR, and other things relevant to the algo. Thus, while the site is still in the Google directory, Googlebot when spidering the DMOZ will spot that the site has been removed, and thus gets NO PR benefit from the DMOZ anymore.
One thing you may not understand here. If a DMOZ editor removes a spam site, that does NOT mean it is removed from Google. The site just loses the PR benefit from the DMOZ listing. The reason these sites are still in Google is they likely have other inbound links besides the DMOZ.
How do you work that out?
A flat file or relational database can contain absolutely any bit of information required by the recieving system - i.e. google.
yes, but no more information than spidering the website. why go out of their way?
Why would they care?
Because they put 11 year old kids by the thousand infront of hard core porn under the guise of kids sites and weather and sport - thats why.
Well, google doesn't really, the web site owners do. Google's not responsible for mislabling within DMOZ and I doubt they wish to take on the responsibility. If they cared very much about dmoz abuse they might update the RDF file more frequently.
If the sites themselves still resemble kids sites in title and meta information, I'm sure reporting them to abuse@google.com would help. DMOZ can get you indexed, but not deindexed. I know sites permbanned from dmoz that rank well in google, and deserve to.
rfgdxm1 makes the stellar point that, removal from dmoz does not = removal from google, so you'll still have to report them to google abuse regardless. And if they aren't using spam techniques then there may be nothing you can do but wait, hope they lose the benefit if the DMOZ pr boost after they lose the link, and hope they haven't boosted the site on other kids terms with other link reciprocation. In other words, just hope they don't rank well in the future.
">Two sites have been dropped by Dmoz as I have reported them, however they are still in Google two DANCES LATER!
One thing you may not understand here. If a DMOZ editor removes a spam site, that does NOT mean it is removed from Google. The site just loses the PR benefit from the DMOZ listing. The reason these sites are still in Google is they likely have other inbound links besides the DMOZ. "
Fair cop - I do understand, i'd believe what you say more though if google didnt also show the directory category it was still in within the serps.
Also, I would believe it more if the directory entry didn't show up with the link: command.
As far as I can see, google has no indication that these sites have been removed from Dmoz, and as it still reports them as being there, I can only assume that they still get the PR. Otherwise why show them as being there still when the process has detected their removal and thus dropped the PR value. ALSO - the sites wouldnt rank in actual fact without help as they are poorly optimized and nest within a large group of returned results.
So - back to square 1 - they are dropped in Dmoz, they are in Google and are still getting PR boosts from Dmoz.
Fiver.
The Google directory!= the DMOZ directory. The Google directory comes from the ODP RDF dump, and that is over 2 months out of date now. Thus, while the site gets no PR benefit from the DMOZ, it still shows as being listed in the Google directory. Note also if Google spiders the DMOZ just before the site is booted, in the next update it will get that DMOZ PR bump for 1 more month. My sites just got the deep crawl today. If the DMOZ also has got the deep crawl, if you report a site and it got yanked tomorrow, it would still show up in with link: in the next Google Dance. This delay is unavoidable the way Google works.
Sounds encouraging. Will read about the RDF dump.
Before I do - just incase it isnt in, can anyone tell me what "RDF" stands for?
Either way - i'll report all the other sites to Dmoz - shall I use that Godzilla - is he active?
Cheers
Fiver.
BUT in terms of ODP resource I guess it would be a nice-to-have but there are more pressing issues such as fixing the RDF dump. Sadly the ODP doesn't have the same resources as Google which is better at detecting these things.
I can't see that this particular problem is goint to get any better.. there's a growing market in expired domains, and more importantly, expired traffic.