Forum Moderators: open
Anyone know how hitwise collects data from your site regarding search activity to your pages?
I would like to block its spider from providing it with data from a number of sites but dont know what its called or how i can identify it from site stats
Anyone know or managed to block it only im fed up now of it giving market data to competitors for them to copy - time to squash it i think.
Thanks in advance
One of their informational pages
http ://researchstore.hitwise.com/
Go to lower left corner of page and enter a domain.
Then view your logs and see what that came in as.
One of mine returned an "error ocurred". I may only assume this is the result of a 403.
You should also keep in mind that their company is worldwide, thus crawlers could feasibly come from anywhere.
If you do a DNS on their domain name you'll see some indifference.
It hit the main page and two sub-folders, no robots.txt, no images.
209.85.54.*** - - [20/Oct/2006:11:41:19 -0700] "GET / HTTP/1.1" 403 - "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
Please note; this Class C range of this provider has a history of being a pest.
But I checked my logs immediately and there were no attempts to access the site, blocked or not at that time. I opened the Latest Visitors page, then refreshed it immediately after trying to get a report. To make sure I wasn't missing anything, I also checked the raw logs.
So it seems they are getting their info some other way, at other times.
block its spider
What data do you think they could get by sending a spider your crawl your site anyway? Surely they couldn't get traffic data by sending a spider to your site, no?
I have to say i think its absolutely outrageous, if i can find anyway to prevent this i want to do it, enoughs enough one of our sites gets coppied on a regular basis.
Often i see in the logs of this one site client.hitwise. In other words the competitor sees that im gaining X % of trafic from "blue night time widgets" keywords for example and visits my page to see what its about. A few days later they have a page similar to get my visitors.
Ive blocked the IPs of all the major competitors, now i want to stop this company obtaining and selling my hard worked on market inteligence.
Anyone with any ideas would be most welcome
Plus, they're probably reading this thread right now, being contentious active or silent www members,
Its good to have an idea of how hitwise works, igenious, but it must be expensive, then again they must know what they're about
As far as i know with my dealings with them over the year they actually install boxes at major ISP's between the ISP and you that log activity so there is actually no way for you to stop this.
bateman and bhartzer,
I guess it's possible, however I find it a bit far-feteched that Internet Service Providers who sell the majority of the business to private indviduals would allow such a breach of integrity and security.
Same goes for major hubs.
Now! From a colo or a large hosting service of websites, I would find such a possibility plausible and within their scruples of "business issues".
Course these limited options would only provide limited stats.
Bang on the money... and dont i know it however, all i can do is make things as difficult as possible for them.
wilderness,
"Course these limited options would only provide limited stats"
Yes, but lets say my team have been expanding / researching a section about "blue widgets" and we have 50 documents about variations of these widgets lets call them, "a1 blue widget", "b1 blue widget", "c1 blue widget" etc etc and users of google search start for some reason searching on "h1 blue widget" and i start getting traffic from it to that page.
At this point my competitor is going to hitwise and reviewing the "blue widget market" and notices my site has increased its share slightly of the blue widget market. On further investigation they find that my pages relating to "h1 blue widget" are getting the bulk of the traffic, at this point they copy me and put out a dedicated section on h1 blue widget.
In other words they havent had to put the work into expanding the blue widget section building/ researching 50 documents, they just copy the one precise document part that gets the search engine trafic hits.
So this data albiet limited is enough for them to maximise own position in the market without doing any of the hard work just because they can cheat and get hold of the data - This is the issue i have with it, they benefit from my hard work and to be honest it feels like theft!.
In other words they havent had to put the work into expanding the blue widget section building/ researching 50 documents, they just copy the one precise document part that gets the search engine trafic hits.So this data albiet limited is enough for them to maximise own position in the market without doing any of the hard work just because they can cheat and get hold of the data - This is the issue i have with it, they benefit from my hard work and to be honest it feels like theft!.
Rich,
The only solution that you have then is to monitor your logs/visititors and limit the spidering (in addition to access time between pages for non-spidering visitors, as well as noting and taking actions against "snoop" visitors). In addition limiting reigonal access as well.
It's the only solution when focused on a very limited market share.
Please note; all the above places you into a category of extremism.