homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google AdSense
Forum Library, Charter, Moderators: incrediBILL & jatar k & martinibuster

Google AdSense Forum

This 49 message thread spans 2 pages: 49 ( [1] 2 > >     
Blekko shows your other sites with the same adsense pub id
koan




msg:4247328
 7:26 am on Dec 30, 2010 (gmt 0)

If you search the domain name of a site that contains adsense ads with blekko, it displays a link called "adsense" in the results and if you click on it, it'll display all the other sites it knows using the same adsense pub id. I know some other sites provide this service, but at least people have to pay for it so it's not public information for any casual visitors, or worse, people who will reuse that info in some mashed up, scrappy site.

Considering this is rather personal information, I'm deliberating blocking this new search engine in my robots.txt file as it isn't really bringing any traffic, it's using my bandwidth and it's already pushing some boundaries regarding my privacy. As a webmaster, I know we should be open to new search technologies and give a chance to new comers, but what have I to gain really by allowing them to crawl my sites so far if the negatives outweigh the positives?

 

martinibuster




msg:4247331
 7:51 am on Dec 30, 2010 (gmt 0)

Blekko's Crawler is named ScoutJet [blekko.com].

User-agent: ScoutJet
Disallow: /

incrediBILL




msg:4247350
 8:48 am on Dec 30, 2010 (gmt 0)

This could be a huge problem for people trying to keep their competitors from knowing all their sites.

I'm thinking it might be time to cloak the ads on my site away from all search engines to avoid such potential problems.

However, since I whitelist my robots.txt, blekko never crawled my site in the first place so I'm not worried at the moment.

koan




msg:4247358
 9:05 am on Dec 30, 2010 (gmt 0)

incrediBILL, how do you whitelist your robots.txt, do you block them all first and then allow some trusted search engines individually? I'm getting tired of having to watch my back like that (aboutus.org, archive.org, blekko.com, etc).

tristanperry




msg:4247387
 9:56 am on Dec 30, 2010 (gmt 0)

@koan: I might be wrong, but I believe it's:

# Block all
User-agent: *
Disallow: /

# Whitelist
User-agent: Googlebot
Disallow:

User-agent: msnbot
Disallow:

# etc

topr8




msg:4247392
 10:17 am on Dec 30, 2010 (gmt 0)

@koan you need a dynamic robots txt - the default is

Block all

then test for the spiders you want and serve them a different set of rules

cien




msg:4247439
 1:48 pm on Dec 30, 2010 (gmt 0)

What were they thinking! Thanks for that. Blocked.

topr8




msg:4247440
 1:53 pm on Dec 30, 2010 (gmt 0)

well it's not the first of their antics!

[webmasterworld.com...]

engine




msg:4247444
 2:17 pm on Dec 30, 2010 (gmt 0)

I was warning folks of this at PubCon. I was surprised some folks didn't see it as a problem.

streko




msg:4247449
 2:38 pm on Dec 30, 2010 (gmt 0)

blekko's been doing this for a while, was in a couple of presentations at pubcon. you can also do the same with the GA code.

frontpage




msg:4247459
 2:56 pm on Dec 30, 2010 (gmt 0)

If you use ModSecurity 2.x, here is a rule to serve that ScoutJet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"

According to Blekko, ScoutJet crawls from the following IP ranges:

64.13.159.*
38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*

wheel




msg:4247460
 3:04 pm on Dec 30, 2010 (gmt 0)

Wow. Thanks for that!

travelin cat




msg:4247541
 4:50 pm on Dec 30, 2010 (gmt 0)

Our main site has AdSense on interior pages only, not on the home page and Blekko does not show the link to view our other properties with AdSense when searching for our domain name.

Kufu




msg:4247550
 5:21 pm on Dec 30, 2010 (gmt 0)

What are people thinking?!

Blocked!

I am thinking that the 'white list' idea is a very good one.

frontpage




msg:4247553
 5:28 pm on Dec 30, 2010 (gmt 0)

I am thinking that the 'white list' idea is a very good one.


That is only if you actually trust spiders to respect your robots.txt. The Spider Forums here are replete with tales of spiders that ignore robots.txt.

I just ban them via firewall or ModSecurity.

I have lots of website hosting/colo IP ranges banned, it makes life more pleasant.

chrisv1963




msg:4247570
 6:10 pm on Dec 30, 2010 (gmt 0)

I knew that this was a crap search engine ...

Sgt_Kickaxe




msg:4247575
 6:17 pm on Dec 30, 2010 (gmt 0)

robots.txt will not stop your sites from being discoverable and adsense isn't the only footprint that links them together. Analytics, other 3rd party tracking, other ad network identifiers, your footer copyright link etc..etc. the list is LONG of ways to connect the dot coms.

A+ to Blekko for giving it a shot but I don't suspect it will attract the sort they want.

chrisv1963




msg:4247577
 6:22 pm on Dec 30, 2010 (gmt 0)

The good news is that Blekko is the new Cuil. Cuil went live on July 28th 2008 and the servers were shut down on September 17th 2010.

incrediBILL




msg:4247668
 9:30 pm on Dec 30, 2010 (gmt 0)

I've been sifting through some blekko AdSense data today and I'm completely amazed at what I could effortlessly learn about many sites, some almost shocking (to me anyway).

The IP search is equally as enlightening, especially for sites that use their own dedicated servers.

All I can say is ... WOW ...

To many webmasters this is a nasty privacy violation as many of us use private registrations to maintain a certain level of independence/anonymity between sites we run for either business or personal reasons and unraveling all this information could be massively damaging to some people.

koan




msg:4247687
 10:41 pm on Dec 30, 2010 (gmt 0)

incrediBILL, I was also exploring their SEO tools when I found about Adsense data. Blekko has in fact a lot of useful tools for webmasters and at first I was pleasantly surprised. But then you realize that what you learn about others... others can learn about you also, and there were no real benefits from being indexed by them as they're not sending any traffic. Still I wondered what others thought. I decided today to add them in the robots.txt of all my sites, especially after reading the no-archive thread [webmasterworld.com].

Rockyou




msg:4247775
 6:49 am on Dec 31, 2010 (gmt 0)

Is it legal to do this? I hate this kind of information being shared, I will write to Google regarding this. Search engine should learn to respect people's privacy. It can also damage financially.

incrediBILL




msg:4247786
 7:07 am on Dec 31, 2010 (gmt 0)

sure it's legal, we publish it publicly, nothing wrong with indexing public data.

the question is "is it ethical", which I'd say "NO!"

acemi




msg:4247821
 12:07 pm on Dec 31, 2010 (gmt 0)

If you use ModSecurity 2.x, here is a rule to serve that scoutjet user agent a 403 Forbidden page.

SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403"


Thanks frontpage

After I added this rule I realised the extent of their bot's crawling with hundreds of 403s in the log. This should keep them away from now on.

drall




msg:4247834
 1:27 pm on Dec 31, 2010 (gmt 0)

I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?

topr8




msg:4247838
 1:32 pm on Dec 31, 2010 (gmt 0)

>>I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?

google obviously know which sites all belong to the same person anyway! so it isn't this.

ken_b




msg:4247855
 2:41 pm on Dec 31, 2010 (gmt 0)

robots.txt.....

Does Blekko honor robots.txt?

nmfam




msg:4247934
 6:13 pm on Dec 31, 2010 (gmt 0)

I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)

... I mean sure, they're making more convenient by rolling it up like this, but for anyone resourceful, its not impossible to get this data otherwise.

cien




msg:4247940
 6:30 pm on Dec 31, 2010 (gmt 0)

I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?


Where in the TOS says you can't do that? Just asking. I think most people are concerned because it is easy to spy on their hard work with a single click. Basically what Blekko is doing is telling users, "sure you can take a good look at a t-bone by shoving your head far up a butcher's a.. but wouldn't you rather take my word for it? Here click here..". :-)

It is unacceptable for Blekko to do this. They are going on the same path of Cuil. Block these little wannabes.

Brett_Tabke




msg:4247987
 8:50 pm on Dec 31, 2010 (gmt 0)

I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-)


Exactly. If I am a competitor, I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.

incrediBILL




msg:4248000
 10:00 pm on Dec 31, 2010 (gmt 0)

I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.


Exactly what syntax would you use to do this in Bing/Hoo?

Couldn't find any of my AdSense codes in either unless I'm missing something obvious.

This 49 message thread spans 2 pages: 49 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google AdSense
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved