Welcome to WebmasterWorld Guest from 54.159.246.164

Forum Moderators: incrediBILL & martinibuster

Message Too Old, No Replies

Blekko shows your other sites with the same adsense pub id

   
7:26 am on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



If you search the domain name of a site that contains adsense ads with blekko, it displays a link called "adsense" in the results and if you click on it, it'll display all the other sites it knows using the same adsense pub id. I know some other sites provide this service, but at least people have to pay for it so it's not public information for any casual visitors, or worse, people who will reuse that info in some mashed up, scrappy site.

Considering this is rather personal information, I'm deliberating blocking this new search engine in my robots.txt file as it isn't really bringing any traffic, it's using my bandwidth and it's already pushing some boundaries regarding my privacy. As a webmaster, I know we should be open to new search technologies and give a chance to new comers, but what have I to gain really by allowing them to crawl my sites so far if the negatives outweigh the positives?
7:51 am on Dec 30, 2010 (gmt 0)

WebmasterWorld Administrator martinibuster is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Blekko's Crawler is named ScoutJet [blekko.com].

User-agent: ScoutJet
Disallow: /
8:48 am on Dec 30, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



This could be a huge problem for people trying to keep their competitors from knowing all their sites.

I'm thinking it might be time to cloak the ads on my site away from all search engines to avoid such potential problems.

However, since I whitelist my robots.txt, blekko never crawled my site in the first place so I'm not worried at the moment.
9:05 am on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



incrediBILL, how do you whitelist your robots.txt, do you block them all first and then allow some trusted search engines individually? I'm getting tired of having to watch my back like that (aboutus.org, archive.org, blekko.com, etc).
9:56 am on Dec 30, 2010 (gmt 0)



@koan: I might be wrong, but I believe it's:

# Block all
User-agent: *
Disallow: /

# Whitelist
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

# etc
10:17 am on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



@koan you need a dynamic robots txt - the default is

Block all

then test for the spiders you want and serve them a different set of rules
1:48 pm on Dec 30, 2010 (gmt 0)

5+ Year Member



What were they thinking! Thanks for that. Blocked.
1:53 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



well it's not the first of their antics!

[webmasterworld.com...]
2:17 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



I was warning folks of this at PubCon. I was surprised some folks didn't see it as a problem.
2:38 pm on Dec 30, 2010 (gmt 0)

5+ Year Member



blekko's been doing this for a while, was in a couple of presentations at pubcon. you can also do the same with the GA code.
2:56 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you use ModSecurity 2.x, here is a rule to serve that ScoutJet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"


According to Blekko, ScoutJet crawls from the following IP ranges:

64.13.159.*
38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*
3:04 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member wheel is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Wow. Thanks for that!
4:50 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Administrator travelin_cat is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Our main site has AdSense on interior pages only, not on the home page and Blekko does not show the link to view our other properties with AdSense when searching for our domain name.
5:21 pm on Dec 30, 2010 (gmt 0)

5+ Year Member



What are people thinking?!

Blocked!

I am thinking that the 'white list' idea is a very good one.
5:28 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am thinking that the 'white list' idea is a very good one.


That is only if you actually trust spiders to respect your robots.txt. The Spider Forums here are replete with tales of spiders that ignore robots.txt.

I just ban them via firewall or ModSecurity.

I have lots of website hosting/colo IP ranges banned, it makes life more pleasant.
6:10 pm on Dec 30, 2010 (gmt 0)

5+ Year Member



I knew that this was a crap search engine ...
6:17 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



robots.txt will not stop your sites from being discoverable and adsense isn't the only footprint that links them together. Analytics, other 3rd party tracking, other ad network identifiers, your footer copyright link etc..etc. the list is LONG of ways to connect the dot coms.

A+ to Blekko for giving it a shot but I don't suspect it will attract the sort they want.
6:22 pm on Dec 30, 2010 (gmt 0)

5+ Year Member



The good news is that Blekko is the new Cuil. Cuil went live on July 28th 2008 and the servers were shut down on September 17th 2010.
9:30 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've been sifting through some blekko AdSense data today and I'm completely amazed at what I could effortlessly learn about many sites, some almost shocking (to me anyway).

The IP search is equally as enlightening, especially for sites that use their own dedicated servers.

All I can say is ... WOW ...

To many webmasters this is a nasty privacy violation as many of us use private registrations to maintain a certain level of independence/anonymity between sites we run for either business or personal reasons and unraveling all this information could be massively damaging to some people.
10:41 pm on Dec 30, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



incrediBILL, I was also exploring their SEO tools when I found about Adsense data. Blekko has in fact a lot of useful tools for webmasters and at first I was pleasantly surprised. But then you realize that what you learn about others... others can learn about you also, and there were no real benefits from being indexed by them as they're not sending any traffic. Still I wondered what others thought. I decided today to add them in the robots.txt of all my sites, especially after reading the no-archive thread [webmasterworld.com].
6:49 am on Dec 31, 2010 (gmt 0)



Is it legal to do this? I hate this kind of information being shared, I will write to Google regarding this. Search engine should learn to respect people's privacy. It can also damage financially.
7:07 am on Dec 31, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



sure it's legal, we publish it publicly, nothing wrong with indexing public data.

the question is "is it ethical", which I'd say "NO!"
12:07 pm on Dec 31, 2010 (gmt 0)

10+ Year Member



If you use ModSecurity 2.x, here is a rule to serve that scoutjet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"


Thanks frontpage

After I added this rule I realised the extent of their bot's crawling with hundreds of 403s in the log. This should keep them away from now on.
1:27 pm on Dec 31, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?
1:32 pm on Dec 31, 2010 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?

google obviously know which sites all belong to the same person anyway! so it isn't this.
2:41 pm on Dec 31, 2010 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



robots.txt.....

Does Blekko honor robots.txt?
6:13 pm on Dec 31, 2010 (gmt 0)



I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)

... I mean sure, they're making more convenient by rolling it up like this, but for anyone resourceful, its not impossible to get this data otherwise.
6:30 pm on Dec 31, 2010 (gmt 0)

5+ Year Member



I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?


Where in the TOS says you can't do that? Just asking. I think most people are concerned because it is easy to spy on their hard work with a single click. Basically what Blekko is doing is telling users, "sure you can take a good look at a t-bone by shoving your head far up a butcher's a.. but wouldn't you rather take my word for it? Here click here..". :-)

It is unacceptable for Blekko to do this. They are going on the same path of Cuil. Block these little wannabes.
8:50 pm on Dec 31, 2010 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month Best Post Of The Month



I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)


Exactly. If I am a competitor, I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.
10:00 pm on Dec 31, 2010 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.


Exactly what syntax would you use to do this in Bing/Hoo?

Couldn't find any of my AdSense codes in either unless I'm missing something obvious.
This 49 message thread spans 2 pages: 49