Welcome to WebmasterWorld Guest from 54.225.38.53

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

Blekko shows your other sites with the same adsense pub id

     
7:26 am on Dec 30, 2010 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2005
posts:1781
votes: 40


If you search the domain name of a site that contains adsense ads with blekko, it displays a link called "adsense" in the results and if you click on it, it'll display all the other sites it knows using the same adsense pub id. I know some other sites provide this service, but at least people have to pay for it so it's not public information for any casual visitors, or worse, people who will reuse that info in some mashed up, scrappy site.

Considering this is rather personal information, I'm deliberating blocking this new search engine in my robots.txt file as it isn't really bringing any traffic, it's using my bandwidth and it's already pushing some boundaries regarding my privacy. As a webmaster, I know we should be open to new search technologies and give a chance to new comers, but what have I to gain really by allowing them to crawl my sites so far if the negatives outweigh the positives?
7:51 am on Dec 30, 2010 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 13, 2002
posts:14525
votes: 354


Blekko's Crawler is named ScoutJet [blekko.com].

User-agent: ScoutJet
Disallow: /
8:48 am on Dec 30, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


This could be a huge problem for people trying to keep their competitors from knowing all their sites.

I'm thinking it might be time to cloak the ads on my site away from all search engines to avoid such potential problems.

However, since I whitelist my robots.txt, blekko never crawled my site in the first place so I'm not worried at the moment.
9:05 am on Dec 30, 2010 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2005
posts:1781
votes: 40


incrediBILL, how do you whitelist your robots.txt, do you block them all first and then allow some trusted search engines individually? I'm getting tired of having to watch my back like that (aboutus.org, archive.org, blekko.com, etc).
9:56 am on Dec 30, 2010 (gmt 0)

Full Member

5+ Year Member

joined:Sept 14, 2010
posts: 205
votes: 0


@koan: I might be wrong, but I believe it's:

# Block all
User-agent: *
Disallow: /

# Whitelist
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

# etc
10:17 am on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3315
votes: 28


@koan you need a dynamic robots txt - the default is

Block all

then test for the spiders you want and serve them a different set of rules
1:48 pm on Dec 30, 2010 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 17, 2005
posts: 459
votes: 0


What were they thinking! Thanks for that. Blocked.
1:53 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3315
votes: 28


well it's not the first of their antics!

[webmasterworld.com...]
2:17 pm on Dec 30, 2010 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month

joined:May 9, 2000
posts:24436
votes: 566


I was warning folks of this at PubCon. I was surprised some folks didn't see it as a problem.
2:38 pm on Dec 30, 2010 (gmt 0)

New User

5+ Year Member

joined:Jan 6, 2009
posts:1
votes: 0


blekko's been doing this for a while, was in a couple of presentations at pubcon. you can also do the same with the GA code.
2:56 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 23, 2002
posts:659
votes: 0


If you use ModSecurity 2.x, here is a rule to serve that ScoutJet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"


According to Blekko, ScoutJet crawls from the following IP ranges:

64.13.159.*
38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*
3:04 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member wheel is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 11, 2003
posts:5072
votes: 12


Wow. Thanks for that!
4:50 pm on Dec 30, 2010 (gmt 0)

Moderator from US 

WebmasterWorld Administrator travelin_cat is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 28, 2004
posts:3225
votes: 12


Our main site has AdSense on interior pages only, not on the home page and Blekko does not show the link to view our other properties with AdSense when searching for our domain name.
5:21 pm on Dec 30, 2010 (gmt 0)

Preferred Member from US 

10+ Year Member

joined:June 6, 2005
posts:524
votes: 1


What are people thinking?!

Blocked!

I am thinking that the 'white list' idea is a very good one.
5:28 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 23, 2002
posts:659
votes: 0


I am thinking that the 'white list' idea is a very good one.


That is only if you actually trust spiders to respect your robots.txt. The Spider Forums here are replete with tales of spiders that ignore robots.txt.

I just ban them via firewall or ModSecurity.

I have lots of website hosting/colo IP ranges banned, it makes life more pleasant.
6:10 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 14, 2006
posts:684
votes: 52


I knew that this was a crap search engine ...
6:17 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member

joined:Apr 14, 2010
posts:3169
votes: 0


robots.txt will not stop your sites from being discoverable and adsense isn't the only footprint that links them together. Analytics, other 3rd party tracking, other ad network identifiers, your footer copyright link etc..etc. the list is LONG of ways to connect the dot coms.

A+ to Blekko for giving it a shot but I don't suspect it will attract the sort they want.
6:22 pm on Dec 30, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 14, 2006
posts:684
votes: 52


The good news is that Blekko is the new Cuil. Cuil went live on July 28th 2008 and the servers were shut down on September 17th 2010.
9:30 pm on Dec 30, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I've been sifting through some blekko AdSense data today and I'm completely amazed at what I could effortlessly learn about many sites, some almost shocking (to me anyway).

The IP search is equally as enlightening, especially for sites that use their own dedicated servers.

All I can say is ... WOW ...

To many webmasters this is a nasty privacy violation as many of us use private registrations to maintain a certain level of independence/anonymity between sites we run for either business or personal reasons and unraveling all this information could be massively damaging to some people.
10:41 pm on Dec 30, 2010 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2005
posts:1781
votes: 40


incrediBILL, I was also exploring their SEO tools when I found about Adsense data. Blekko has in fact a lot of useful tools for webmasters and at first I was pleasantly surprised. But then you realize that what you learn about others... others can learn about you also, and there were no real benefits from being indexed by them as they're not sending any traffic. Still I wondered what others thought. I decided today to add them in the robots.txt of all my sites, especially after reading the no-archive thread [webmasterworld.com].
6:49 am on Dec 31, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Oct 5, 2010
posts:146
votes: 0


Is it legal to do this? I hate this kind of information being shared, I will write to Google regarding this. Search engine should learn to respect people's privacy. It can also damage financially.
7:07 am on Dec 31, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


sure it's legal, we publish it publicly, nothing wrong with indexing public data.

the question is "is it ethical", which I'd say "NO!"
12:07 pm on Dec 31, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 26, 2003
posts: 46
votes: 0


If you use ModSecurity 2.x, here is a rule to serve that scoutjet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"


Thanks frontpage

After I added this rule I realised the extent of their bot's crawling with hundreds of 403s in the log. This should keep them away from now on.
1:27 pm on Dec 31, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 16, 2004
posts:854
votes: 0


I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?
1:32 pm on Dec 31, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3315
votes: 28


>>I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?

google obviously know which sites all belong to the same person anyway! so it isn't this.
2:41 pm on Dec 31, 2010 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 5, 2001
posts:5801
votes: 93


robots.txt.....

Does Blekko honor robots.txt?
6:13 pm on Dec 31, 2010 (gmt 0)

Junior Member

5+ Year Member

joined:Dec 2, 2010
posts: 71
votes: 0


I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)

... I mean sure, they're making more convenient by rolling it up like this, but for anyone resourceful, its not impossible to get this data otherwise.
6:30 pm on Dec 31, 2010 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 17, 2005
posts: 459
votes: 0


I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?


Where in the TOS says you can't do that? Just asking. I think most people are concerned because it is easy to spy on their hard work with a single click. Basically what Blekko is doing is telling users, "sure you can take a good look at a t-bone by shoving your head far up a butcher's a.. but wouldn't you rather take my word for it? Here click here..". :-)

It is unacceptable for Blekko to do this. They are going on the same path of Cuil. Block these little wannabes.
8:50 pm on Dec 31, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38071
votes: 16


I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)


Exactly. If I am a competitor, I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.
10:00 pm on Dec 31, 2010 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.


Exactly what syntax would you use to do this in Bing/Hoo?

Couldn't find any of my AdSense codes in either unless I'm missing something obvious.
This 49 message thread spans 2 pages: 49