Welcome to WebmasterWorld Guest from 54.158.65.139

Forum Moderators: incrediBILL & martinibuster

Adult websites blacklist

How to filter the list of websites?

   
11:00 am on Jul 15, 2010 (gmt 0)

5+ Year Member



Hi everybody,

Just got an "adult content warning" from Adsense so we need to filter the list of websites (around a few million) to drop all adult related domains from our service.

I've tried the services like urlblacklist.com but this didn't work. Their database is extremely weak and doesn't contain many non english #*$! websites ;(

Does anyone know is there any better database to solve this problem? What we need is a list of domains containing adult content.

thanks )
11:33 am on Jul 15, 2010 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hunh? You don't know what's on your own site?
11:41 am on Jul 15, 2010 (gmt 0)

5+ Year Member



Yeah ;)

Our content is based on the content of many other websites (domains). It's easy to filter it using a dictionary with #*$! related keywords but how to handle the pages in the languages other than english? Arabic, Russian, German etc etc..

That's why im looking for a database like urlblacklist.com but with a complete list of all adult domains.
11:46 am on Jul 15, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



how do you manage to keep track of a "few million" websites. is that a literal number? you've got to be scraping the content, surely.
11:51 am on Jul 15, 2010 (gmt 0)

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yea, you may not get a lot of help here with that. Some of us spend a lot of time fighting that sort of thing.
11:52 am on Jul 15, 2010 (gmt 0)

5+ Year Member



The question is not how we manage the websites...

I'm talking about _1_ website that indexes around 120mln (of course, not all of them are indexed so far, just a few millions ;) ) of other websites and produces the content basing on their contente (e.g. keyword density stats).. I need to drop all adult websites from our database and i'm looking for a way to do this.
11:56 am on Jul 15, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



oh, we thought you were a scraper.

getting a list of websites is probably a waste of time, because new ones would pop up every five minutes. you'd also have no way of knowing whether the list is complete, which would mean you'd still have exactly the same problem.

you'd be better off scanning your own pages for a dictionary of certain words, and then stopping adsense from appearing on those that have them. that way you can still run other ads, but not adsense.

why throw away thousands of pages?
12:36 pm on Jul 15, 2010 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>oh, we thought you were a scraper.

same difference! someone using my bandwidth to produce keyword density stats - which they then try to make money from by serving them up with adsense ads
12:50 pm on Jul 15, 2010 (gmt 0)

5+ Year Member



yeah, we are bad.. i know that ;)
2:16 pm on Jul 15, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I don't think you're going to find any universal list of adult domains but a partial list is a good start.

I think you're going to have to do your own content filtering and look for adult words on the page and simply disable AdSense on those pages and replace it with something else.

The foreign languages may require translating in order to filter, and sadly the translation tools often deliberately skip over certain words you'll need to filter.

I would start by yanking AdSense until I figured this problem out.

Then deploy it on English sites after filtering out the bad stuff.

FYI, I ran into this same problem with AdSense once, on an art site even, and simple words like erotic or nude will send AdSense in a tailspin.
2:33 pm on Jul 15, 2010 (gmt 0)

5+ Year Member



Thank you

This is what i'm doing right now.. probably i'll have to forget about placing adsense on any non-english page forever ;(
3:02 pm on Jul 15, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



After thinking about it, I have a solution that I also use but it's very time consuming and maybe impractical for the number of pages you have.

You could try running a screen shot tool and looking at maybe 500 screen shots per page looking for adult sites.

It's the fastest way I know to scan that many sites without browsing them individually as your screen shot tool does that for you in the background.

If you make enough from your site, perhaps consider outsourcing the compilation of the offensive terms to filter to someone that speaks the native language.

Additionally, you may want to look for safe site filtering technology in foreign languages, the "net nanny" type of stuff.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month