homepage Welcome to WebmasterWorld Guest from 54.224.179.98
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

    
Adult websites blacklist
How to filter the list of websites?
Pilsen




msg:4170588
 11:00 am on Jul 15, 2010 (gmt 0)

Hi everybody,

Just got an "adult content warning" from Adsense so we need to filter the list of websites (around a few million) to drop all adult related domains from our service.

I've tried the services like urlblacklist.com but this didn't work. Their database is extremely weak and doesn't contain many non english #*$! websites ;(

Does anyone know is there any better database to solve this problem? What we need is a list of domains containing adult content.

thanks )

 

netmeg




msg:4170596
 11:33 am on Jul 15, 2010 (gmt 0)

Hunh? You don't know what's on your own site?

Pilsen




msg:4170601
 11:41 am on Jul 15, 2010 (gmt 0)

Yeah ;)

Our content is based on the content of many other websites (domains). It's easy to filter it using a dictionary with #*$! related keywords but how to handle the pages in the languages other than english? Arabic, Russian, German etc etc..

That's why im looking for a database like urlblacklist.com but with a complete list of all adult domains.

londrum




msg:4170602
 11:46 am on Jul 15, 2010 (gmt 0)

how do you manage to keep track of a "few million" websites. is that a literal number? you've got to be scraping the content, surely.

netmeg




msg:4170605
 11:51 am on Jul 15, 2010 (gmt 0)

Yea, you may not get a lot of help here with that. Some of us spend a lot of time fighting that sort of thing.

Pilsen




msg:4170606
 11:52 am on Jul 15, 2010 (gmt 0)

The question is not how we manage the websites...

I'm talking about _1_ website that indexes around 120mln (of course, not all of them are indexed so far, just a few millions ;) ) of other websites and produces the content basing on their contente (e.g. keyword density stats).. I need to drop all adult websites from our database and i'm looking for a way to do this.

londrum




msg:4170609
 11:56 am on Jul 15, 2010 (gmt 0)

oh, we thought you were a scraper.

getting a list of websites is probably a waste of time, because new ones would pop up every five minutes. you'd also have no way of knowing whether the list is complete, which would mean you'd still have exactly the same problem.

you'd be better off scanning your own pages for a dictionary of certain words, and then stopping adsense from appearing on those that have them. that way you can still run other ads, but not adsense.

why throw away thousands of pages?

topr8




msg:4170642
 12:36 pm on Jul 15, 2010 (gmt 0)

>>oh, we thought you were a scraper.

same difference! someone using my bandwidth to produce keyword density stats - which they then try to make money from by serving them up with adsense ads

Pilsen




msg:4170648
 12:50 pm on Jul 15, 2010 (gmt 0)

yeah, we are bad.. i know that ;)

incrediBILL




msg:4170726
 2:16 pm on Jul 15, 2010 (gmt 0)

I don't think you're going to find any universal list of adult domains but a partial list is a good start.

I think you're going to have to do your own content filtering and look for adult words on the page and simply disable AdSense on those pages and replace it with something else.

The foreign languages may require translating in order to filter, and sadly the translation tools often deliberately skip over certain words you'll need to filter.

I would start by yanking AdSense until I figured this problem out.

Then deploy it on English sites after filtering out the bad stuff.

FYI, I ran into this same problem with AdSense once, on an art site even, and simple words like erotic or nude will send AdSense in a tailspin.

Pilsen




msg:4170743
 2:33 pm on Jul 15, 2010 (gmt 0)

Thank you

This is what i'm doing right now.. probably i'll have to forget about placing adsense on any non-english page forever ;(

incrediBILL




msg:4170786
 3:02 pm on Jul 15, 2010 (gmt 0)

After thinking about it, I have a solution that I also use but it's very time consuming and maybe impractical for the number of pages you have.

You could try running a screen shot tool and looking at maybe 500 screen shots per page looking for adult sites.

It's the fastest way I know to scan that many sites without browsing them individually as your screen shot tool does that for you in the background.

If you make enough from your site, perhaps consider outsourcing the compilation of the offensive terms to filter to someone that speaks the native language.

Additionally, you may want to look for safe site filtering technology in foreign languages, the "net nanny" type of stuff.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved