Forum Moderators: open

Message Too Old, No Replies

Hitwise Spider

How do you block

         

RichTC

4:57 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

Anyone know how hitwise collects data from your site regarding search activity to your pages?

I would like to block its spider from providing it with data from a number of sites but dont know what its called or how i can identify it from site stats

Anyone know or managed to block it only im fed up now of it giving market data to competitors for them to copy - time to squash it i think.

Thanks in advance

wilderness

7:07 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Haven't a clue.

One of their informational pages

http ://researchstore.hitwise.com/

Go to lower left corner of page and enter a domain.
Then view your logs and see what that came in as.

One of mine returned an "error ocurred". I may only assume this is the result of a 403.

You should also keep in mind that their company is worldwide, thus crawlers could feasibly come from anywhere.
If you do a DNS on their domain name you'll see some indifference.

wilderness

8:03 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did not check my logs immediately upon the inquiry and as a result I'm not positive if this is the same or not, however it's the only 403 I have in close proximity.

It hit the main page and two sub-folders, no robots.txt, no images.

209.85.54.*** - - [20/Oct/2006:11:41:19 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

Please note; this Class C range of this provider has a history of being a pest.

Mokita

9:49 pm on Oct 20, 2006 (gmt 0)

10+ Year Member



I tried and got the same error as wilderness.

But I checked my logs immediately and there were no attempts to access the site, blocked or not at that time. I opened the Latest Visitors page, then refreshed it immediately after trying to get a report. To make sure I wasn't missing anything, I also checked the raw logs.

So it seems they are getting their info some other way, at other times.

bhartzer

9:57 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



block its spider

I don't think they have a spider--they get their data another way, by having agreements with ISPs and they watch it all at the router level, not through a spider.

What data do you think they could get by sending a spider your crawl your site anyway? Surely they couldn't get traffic data by sending a spider to your site, no?

bateman_ap

9:57 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As far as i know with my dealings with them over the year they actually install boxes at major ISP's between the ISP and you that log activity so there is actually no way for you to stop this.

bateman_ap

9:58 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Posted at the same time!

RichTC

11:16 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So in effect then this "spyware" i will call it (because thats what it is) can establish what my best performing keywords are to my site and give that data information over to any competitor that wants to pay for it and i cant do a blinding thing about it?

I have to say i think its absolutely outrageous, if i can find anyway to prevent this i want to do it, enoughs enough one of our sites gets coppied on a regular basis.

Often i see in the logs of this one site client.hitwise. In other words the competitor sees that im gaining X % of trafic from "blue night time widgets" keywords for example and visits my page to see what its about. A few days later they have a page similar to get my visitors.

Ive blocked the IPs of all the major competitors, now i want to stop this company obtaining and selling my hard worked on market inteligence.

Anyone with any ideas would be most welcome

vite_rts

11:23 pm on Oct 20, 2006 (gmt 0)

10+ Year Member



One supposes that once your competitors type in you site address an can't find it, they being proficient webbies like you probably take a trip down to the nearest net cafe and,,,

Plus, they're probably reading this thread right now, being contentious active or silent www members,

Its good to have an idea of how hitwise works, igenious, but it must be expensive, then again they must know what they're about

wilderness

11:40 pm on Oct 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As far as i know with my dealings with them over the year they actually install boxes at major ISP's between the ISP and you that log activity so there is actually no way for you to stop this.

bateman and bhartzer,
I guess it's possible, however I find it a bit far-feteched that Internet Service Providers who sell the majority of the business to private indviduals would allow such a breach of integrity and security.

Same goes for major hubs.

Now! From a colo or a large hosting service of websites, I would find such a possibility plausible and within their scruples of "business issues".
Course these limited options would only provide limited stats.

RichTC

1:04 am on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



vite_rts,

Bang on the money... and dont i know it however, all i can do is make things as difficult as possible for them.

wilderness,
"Course these limited options would only provide limited stats"

Yes, but lets say my team have been expanding / researching a section about "blue widgets" and we have 50 documents about variations of these widgets lets call them, "a1 blue widget", "b1 blue widget", "c1 blue widget" etc etc and users of google search start for some reason searching on "h1 blue widget" and i start getting traffic from it to that page.

At this point my competitor is going to hitwise and reviewing the "blue widget market" and notices my site has increased its share slightly of the blue widget market. On further investigation they find that my pages relating to "h1 blue widget" are getting the bulk of the traffic, at this point they copy me and put out a dedicated section on h1 blue widget.

In other words they havent had to put the work into expanding the blue widget section building/ researching 50 documents, they just copy the one precise document part that gets the search engine trafic hits.

So this data albiet limited is enough for them to maximise own position in the market without doing any of the hard work just because they can cheat and get hold of the data - This is the issue i have with it, they benefit from my hard work and to be honest it feels like theft!.

vite_rts

1:36 am on Oct 21, 2006 (gmt 0)

10+ Year Member



hi richtc

I dunno what kinda money you guys battle over, it appears hitwise reports start at $695 per report ,,,

Unless this kinda money is peanuts, well,,,,

wilderness

4:26 am on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In other words they havent had to put the work into expanding the blue widget section building/ researching 50 documents, they just copy the one precise document part that gets the search engine trafic hits.

So this data albiet limited is enough for them to maximise own position in the market without doing any of the hard work just because they can cheat and get hold of the data - This is the issue i have with it, they benefit from my hard work and to be honest it feels like theft!.

Rich,
The only solution that you have then is to monitor your logs/visititors and limit the spidering (in addition to access time between pages for non-spidering visitors, as well as noting and taking actions against "snoop" visitors). In addition limiting reigonal access as well.

It's the only solution when focused on a very limited market share.

Please note; all the above places you into a category of extremism.