| 3:00 pm on Dec 9, 2009 (gmt 0)|
As long as you obey their robots.txt rules, and are not republishing the information, I don't see a problem.
Sure seems like a lot of work/time you could be spending on other things though. Just my opinion.
| 4:22 pm on Dec 9, 2009 (gmt 0)|
thanks, I thought that would be the case - I'm definately not going to be publishing the data, just using it internally.
|Sure seems like a lot of work/time you could be spending on other things though. Just my opinion. |
I can see why you think this, however I think benchmarking competitors is an essential business activity, and should be standard for any analytics/optimisation team. If my penetration levels are 80% then that looks really good, but I'll never know how good or bad this actually is until I benchmark my competition.
They could have 90% penetration - which puts my 80% in more context.
| 5:00 pm on Dec 9, 2009 (gmt 0)|
I'm sorry for the "dumb" question, but what exactly do you want to measure, and what do you mean by "penetration"?
As a network admin / developer, this isn't my strong field, but I'm sure I can learn something about this :)
| 5:53 pm on Dec 9, 2009 (gmt 0)|
not a problem. By penetration I mean your share of a customer base. for example, you sell forum software...
100 websites use forum software, and 55 of them use your software, and 30 from your main competitor, and the rest use other forum software.
This would mean you have 55% of the market, so your market penetration is 55%.
Coming back to my original post - if you don't know how many websites use forum software in the first place, then you can have no way of knowing your market penetration. I know how many members my site has, but I dont know how many my competitor has - or how many are members of both our sites. Spidering would allow me to find out this information.
You may think that your 55 websites using your software is really good, but what would you think if you found out that 10,000 websites use forum software? and 5,000 are using your competitors software? all of a sudden you haven't "penetrated" the market at all.
I hope this helps.
| 6:30 pm on Dec 9, 2009 (gmt 0)|
Ok, I see where you're going with this. But I suppose this only really helps if you have a very specialized product? In the case of forum software, it's almost impossible to see who has forum software installed. Take phpBB / SMF / other OSS versions for example, how will the developers ever know how many instances are installed, exactly? With vBulletin / Invision board / other commercial scripts, this is obviously easy, since they probably track sales. But why would they publish this on their website?
In our case, of web hosting, this is pretty much an impossible "battle". Non of my competitors will have an exact amount of clients advertised on their website :)
| 9:14 am on Dec 10, 2009 (gmt 0)|
yes, that is very true. We have a very specialised niche, whose numbers can be counted.
Forum software was a bad example, but used to provided to explain what I meant by penetration.
thanks for the replies. If anyone disagrees with the legality of spidering for internal reasons - please let me know.
| 12:00 pm on Dec 10, 2009 (gmt 0)|
Webmasters put up with spiders eating bandwidth because of the benefits of being listed by search engines. Legal or not I think that finding a competitor consuming their bandwidth with a bot would get a lot of people phoning their lawyers for advice.
| 12:31 pm on Dec 10, 2009 (gmt 0)|
Which is why I am asking on here if it is legal? I've had companies contact us asking if we wanted to spend thousands for a list of their members that they'll get by spidering - I am wondering if we can do it ourselves...
but want to check the legality of it first, hence my post on here
| 1:03 pm on Dec 10, 2009 (gmt 0)|
Most of us are not lawyers and are just sharing our opinions. I would consult an attorney for real legal advice.
My opinion, you are researching information that is public. Whether you read it manually or read it via a bot that obeys their site rules, that to me is not illegal (again, I'm not a lawyer).
Then there is the point of saving the information locally. That in itself is not illegal. How many web browsers cache pages locally on your hard drive? If it was illegal, browsers would not be able to do that. Nor would they have a "Save as" option for any webpage you visit.
Some folks like to save pages offline to view later while they are on trips, etc. All of this seem to fall under fair use of a site's public information.
But again, I'm not a lawyer... ;-)
| 1:07 pm on Dec 10, 2009 (gmt 0)|
if they can do anything about it legally is another matter.
| 2:22 pm on Dec 10, 2009 (gmt 0)|
Plus most sites don't require you to agree to their terms and conditions when you visit the site. Usually only when you sign up.
| 2:59 pm on Dec 10, 2009 (gmt 0)|
One technical note:
When you spider your competitor's website with software like wget, make sure you wait a second or two between two fetches. Otherwise you could overload your competitor's server (DOS attempt) and that could get you in trouble.
| 3:19 pm on Dec 10, 2009 (gmt 0)|
Before I post a particular question - is it ok to ask for a product recommendation on webmasterworld? or should I leave the research of this down to some googling?
Thanks in advance
| 3:45 pm on Dec 10, 2009 (gmt 0)|
a request for product recommendations is likely to potentially violate of one or more the WebmasterWorld terms of service [webmasterworld.com] (such as 13, 20 & 25)
| 4:00 pm on Dec 10, 2009 (gmt 0)|
good to know, I will avoid asking! Thanks everyone for their advice. I intend to thoroughly read the terms of service and then go ahead if there are no mentions of it being disallowed.
| 4:05 pm on Dec 10, 2009 (gmt 0)|
i wouldn't worry so much about the terms of service for the site you are crawling.
i WOULD respect the robots.txt directives and if your user agent or the url you intend to request is excluded then don't spider that url.