martinibuster

msg:4247331 | 7:51 am on Dec 30, 2010 (gmt 0) |
Blekko's Crawler is named ScoutJet [blekko.com]. User-agent: ScoutJet Disallow: / |
|
|
incrediBILL

msg:4247350 | 8:48 am on Dec 30, 2010 (gmt 0) |
This could be a huge problem for people trying to keep their competitors from knowing all their sites. I'm thinking it might be time to cloak the ads on my site away from all search engines to avoid such potential problems. However, since I whitelist my robots.txt, blekko never crawled my site in the first place so I'm not worried at the moment.
|
koan

msg:4247358 | 9:05 am on Dec 30, 2010 (gmt 0) |
incrediBILL, how do you whitelist your robots.txt, do you block them all first and then allow some trusted search engines individually? I'm getting tired of having to watch my back like that (aboutus.org, archive.org, blekko.com, etc).
|
tristanperry

msg:4247387 | 9:56 am on Dec 30, 2010 (gmt 0) |
@koan: I might be wrong, but I believe it's: # Block all User-agent: * Disallow: / # Whitelist User-agent: Googlebot Disallow: User-agent: msnbot Disallow: # etc
|
topr8

msg:4247392 | 10:17 am on Dec 30, 2010 (gmt 0) |
@koan you need a dynamic robots txt - the default is Block all then test for the spiders you want and serve them a different set of rules
|
cien

msg:4247439 | 1:48 pm on Dec 30, 2010 (gmt 0) |
What were they thinking! Thanks for that. Blocked.
|
topr8

msg:4247440 | 1:53 pm on Dec 30, 2010 (gmt 0) |
well it's not the first of their antics! [webmasterworld.com...]
|
engine

msg:4247444 | 2:17 pm on Dec 30, 2010 (gmt 0) |
I was warning folks of this at PubCon. I was surprised some folks didn't see it as a problem.
|
streko

msg:4247449 | 2:38 pm on Dec 30, 2010 (gmt 0) |
blekko's been doing this for a while, was in a couple of presentations at pubcon. you can also do the same with the GA code.
|
frontpage

msg:4247459 | 2:56 pm on Dec 30, 2010 (gmt 0) |
If you use ModSecurity 2.x, here is a rule to serve that ScoutJet user agent a 403 Forbidden page.
SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403" According to Blekko, ScoutJet crawls from the following IP ranges: 64.13.159.* 38.99.96.*, 38.99.97.*, 38.99.98.*, 38.99.99.*
|
wheel

msg:4247460 | 3:04 pm on Dec 30, 2010 (gmt 0) |
Wow. Thanks for that!
|
travelin cat

msg:4247541 | 4:50 pm on Dec 30, 2010 (gmt 0) |
Our main site has AdSense on interior pages only, not on the home page and Blekko does not show the link to view our other properties with AdSense when searching for our domain name.
|
Kufu

msg:4247550 | 5:21 pm on Dec 30, 2010 (gmt 0) |
What are people thinking?! Blocked! I am thinking that the 'white list' idea is a very good one.
|
frontpage

msg:4247553 | 5:28 pm on Dec 30, 2010 (gmt 0) |
| I am thinking that the 'white list' idea is a very good one. |
| That is only if you actually trust spiders to respect your robots.txt. The Spider Forums here are replete with tales of spiders that ignore robots.txt. I just ban them via firewall or ModSecurity. I have lots of website hosting/colo IP ranges banned, it makes life more pleasant.
|
chrisv1963

msg:4247570 | 6:10 pm on Dec 30, 2010 (gmt 0) |
I knew that this was a crap search engine ...
|
Sgt_Kickaxe

msg:4247575 | 6:17 pm on Dec 30, 2010 (gmt 0) |
robots.txt will not stop your sites from being discoverable and adsense isn't the only footprint that links them together. Analytics, other 3rd party tracking, other ad network identifiers, your footer copyright link etc..etc. the list is LONG of ways to connect the dot coms. A+ to Blekko for giving it a shot but I don't suspect it will attract the sort they want.
|
chrisv1963

msg:4247577 | 6:22 pm on Dec 30, 2010 (gmt 0) |
The good news is that Blekko is the new Cuil. Cuil went live on July 28th 2008 and the servers were shut down on September 17th 2010.
|
incrediBILL

msg:4247668 | 9:30 pm on Dec 30, 2010 (gmt 0) |
I've been sifting through some blekko AdSense data today and I'm completely amazed at what I could effortlessly learn about many sites, some almost shocking (to me anyway). The IP search is equally as enlightening, especially for sites that use their own dedicated servers. All I can say is ... WOW ... To many webmasters this is a nasty privacy violation as many of us use private registrations to maintain a certain level of independence/anonymity between sites we run for either business or personal reasons and unraveling all this information could be massively damaging to some people.
|
koan

msg:4247687 | 10:41 pm on Dec 30, 2010 (gmt 0) |
incrediBILL, I was also exploring their SEO tools when I found about Adsense data. Blekko has in fact a lot of useful tools for webmasters and at first I was pleasantly surprised. But then you realize that what you learn about others... others can learn about you also, and there were no real benefits from being indexed by them as they're not sending any traffic. Still I wondered what others thought. I decided today to add them in the robots.txt of all my sites, especially after reading the no-archive thread [webmasterworld.com].
|
Rockyou

msg:4247775 | 6:49 am on Dec 31, 2010 (gmt 0) |
Is it legal to do this? I hate this kind of information being shared, I will write to Google regarding this. Search engine should learn to respect people's privacy. It can also damage financially.
|
incrediBILL

msg:4247786 | 7:07 am on Dec 31, 2010 (gmt 0) |
sure it's legal, we publish it publicly, nothing wrong with indexing public data. the question is "is it ethical", which I'd say "NO!"
|
acemi

msg:4247821 | 12:07 pm on Dec 31, 2010 (gmt 0) |
If you use ModSecurity 2.x, here is a rule to serve that scoutjet user agent a 403 Forbidden page. SecRule HTTP_User-Agent "ScoutJet" "deny,log,status:403" |
| Thanks frontpage After I added this rule I realised the extent of their bot's crawling with hundreds of 403s in the log. This should keep them away from now on.
|
drall

msg:4247834 | 1:27 pm on Dec 31, 2010 (gmt 0) |
I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS?
|
topr8

msg:4247838 | 1:32 pm on Dec 31, 2010 (gmt 0) |
>>I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS? google obviously know which sites all belong to the same person anyway! so it isn't this.
|
ken_b

msg:4247855 | 2:41 pm on Dec 31, 2010 (gmt 0) |
robots.txt..... Does Blekko honor robots.txt?
|
nmfam

msg:4247934 | 6:13 pm on Dec 31, 2010 (gmt 0) |
I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-) ... I mean sure, they're making more convenient by rolling it up like this, but for anyone resourceful, its not impossible to get this data otherwise.
|
cien

msg:4247940 | 6:30 pm on Dec 31, 2010 (gmt 0) |
| I wonder how many of you that are freaking out about this are the same people who build multiple sites on the same topic to saturate a niche/serps and hide this fact via private whois because this is a violation of the ToS? |
| Where in the TOS says you can't do that? Just asking. I think most people are concerned because it is easy to spy on their hard work with a single click. Basically what Blekko is doing is telling users, "sure you can take a good look at a t-bone by shoving your head far up a butcher's a.. but wouldn't you rather take my word for it? Here click here..". :-) It is unacceptable for Blekko to do this. They are going on the same path of Cuil. Block these little wannabes.
|
Brett_Tabke

msg:4247987 | 8:50 pm on Dec 31, 2010 (gmt 0) |
| I don't understand the uproar here ... you do realize that if you search for the adsense id in many other search engines (except google), it will return you a list of matching pages right? ;-) |
| Exactly. If I am a competitor, I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code.
|
incrediBILL

msg:4248000 | 10:00 pm on Dec 31, 2010 (gmt 0) |
| I am not going to go to blekko to scope you out - I'm going to bing/hoo and put in your adsense code. |
| Exactly what syntax would you use to do this in Bing/Hoo? Couldn't find any of my AdSense codes in either unless I'm missing something obvious.
|
| This 49 message thread spans 2 pages: 49 ( [1] 2 ) > > |
|
|