Welcome to WebmasterWorld Guest from 54.224.230.193

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

Adult websites blacklist

How to filter the list of websites?

     
11:00 am on Jul 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 21, 2009
posts: 13
votes: 0


Hi everybody,

Just got an "adult content warning" from Adsense so we need to filter the list of websites (around a few million) to drop all adult related domains from our service.

I've tried the services like urlblacklist.com but this didn't work. Their database is extremely weak and doesn't contain many non english #*$! websites ;(

Does anyone know is there any better database to solve this problem? What we need is a list of domains containing adult content.

thanks )
11:33 am on July 15, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12901
votes: 193


Hunh? You don't know what's on your own site?
11:41 am on July 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 21, 2009
posts:13
votes: 0


Yeah ;)

Our content is based on the content of many other websites (domains). It's easy to filter it using a dictionary with #*$! related keywords but how to handle the pages in the languages other than english? Arabic, Russian, German etc etc..

That's why im looking for a database like urlblacklist.com but with a complete list of all adult domains.
11:46 am on July 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 12, 2006
posts:2558
votes: 44


how do you manage to keep track of a "few million" websites. is that a literal number? you've got to be scraping the content, surely.
11:51 am on July 15, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member netmeg is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 30, 2005
posts:12901
votes: 193


Yea, you may not get a lot of help here with that. Some of us spend a lot of time fighting that sort of thing.
11:52 am on July 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 21, 2009
posts:13
votes: 0


The question is not how we manage the websites...

I'm talking about _1_ website that indexes around 120mln (of course, not all of them are indexed so far, just a few millions ;) ) of other websites and produces the content basing on their contente (e.g. keyword density stats).. I need to drop all adult websites from our database and i'm looking for a way to do this.
11:56 am on July 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 12, 2006
posts:2558
votes: 44


oh, we thought you were a scraper.

getting a list of websites is probably a waste of time, because new ones would pop up every five minutes. you'd also have no way of knowing whether the list is complete, which would mean you'd still have exactly the same problem.

you'd be better off scanning your own pages for a dictionary of certain words, and then stopping adsense from appearing on those that have them. that way you can still run other ads, but not adsense.

why throw away thousands of pages?
12:36 pm on July 15, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3196
votes: 12


>>oh, we thought you were a scraper.

same difference! someone using my bandwidth to produce keyword density stats - which they then try to make money from by serving them up with adsense ads
12:50 pm on July 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 21, 2009
posts:13
votes: 0


yeah, we are bad.. i know that ;)
2:16 pm on July 15, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


I don't think you're going to find any universal list of adult domains but a partial list is a good start.

I think you're going to have to do your own content filtering and look for adult words on the page and simply disable AdSense on those pages and replace it with something else.

The foreign languages may require translating in order to filter, and sadly the translation tools often deliberately skip over certain words you'll need to filter.

I would start by yanking AdSense until I figured this problem out.

Then deploy it on English sites after filtering out the bad stuff.

FYI, I ran into this same problem with AdSense once, on an art site even, and simple words like erotic or nude will send AdSense in a tailspin.
2:33 pm on July 15, 2010 (gmt 0)

New User

5+ Year Member

joined:Apr 21, 2009
posts:13
votes: 0


Thank you

This is what i'm doing right now.. probably i'll have to forget about placing adsense on any non-english page forever ;(
3:02 pm on July 15, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


After thinking about it, I have a solution that I also use but it's very time consuming and maybe impractical for the number of pages you have.

You could try running a screen shot tool and looking at maybe 500 screen shots per page looking for adult sites.

It's the fastest way I know to scan that many sites without browsing them individually as your screen shot tool does that for you in the background.

If you make enough from your site, perhaps consider outsourcing the compilation of the offensive terms to filter to someone that speaks the native language.

Additionally, you may want to look for safe site filtering technology in foreign languages, the "net nanny" type of stuff.