Forum Moderators: anallawalla & bakedjake
Thanks!
On the other hand, as a collection of data, superpages or yellowpages does have a copyright claim to it and you aren't free to just create a derivative work.
I have seen other data brokers say that they spider a large number of sites, I can't imagine that they would use sites like Chamber of Commerce sites but maybe they do.
As I've seen it explained; individual items of information (name, address, phone number) aren't copyrightable as there's no creative step, but when you compile them into a directory or similar then you've created something new which you can claim copyright on.
If you are going to take a large section of someone's copyright directory and modify it to make your own directory then you will be 'creating a derivative work', something you aren't entitled to do without the permission of the original copyright holder.
Not banning spidering isn't granting you permission to create a derivative work, nor is the absence of a copyright symbol evidence that someone "doesn't claim copyright". Only a specific statement granting you that permission is good enough and copyright is an automatic right produced when you create anything, whether or not you attach a symbol or register it.
Be aware that most major directories have 'deliberate mistakes', fictional entries or incorrectly spelt addresses which allow them to easily demonstrate when someone has copied their content. The only way to find these is to manually check each and every entry you copy.
Another site, the one that inbound mentioned for me to pay $1000 for the data addresses their data gathering as such: "I just wanted to address spidering - we are talking about lightweight spidering on sites that do allow." This site has around 14,000,000 listings. How can anyone possibly lightly spider so many sites? There aren't that many large business directories. Sorry if I sound a bit stupid when I say this but someone would have to assume that a) they did spider the large sites in some way, shape or form along with other smaller ones b) they bought the information from someone and deduped it and are reselling it or c) spidered and deduped an incredible amount of sites which would have to take a lot of time and resources. I am sure there is a missing option d) that I am just not seeing mainly because I am so new to this and I don't fully understand just how in the heck some sites do it and really would like to learn how.
We buy data from various sources. You will be surprised by how many companies you deal with that are more than willing to sell your data [and by companies, I am including your municipal/state/federal government]. So eg we buy data from credit companies. We buy data from the state and federal government. Delivery companies are more than happy to sell your data, and we also get data from large organizations (think restaurants, dentists, etc - anything with an association a 'professional' will want to be a part of). There are also CMRs and other middlemen whose job is to get their client's information out there (from small time mom and pop shops to Fortune 100 companies).
The lightweight spidering refers to augmenting the data when we can. It is nowhere near our primary source.
I should add - the hard part is not the data collection. The hard part is making sense of it.
A second addition - data brokers of all kind (local data, mapping, weather, etc) - they all poison-pill their data.
[edited by: AhmedF at 9:46 pm (utc) on Sep. 24, 2007]
I would like to thank you for providing an answer to a question that has been bugging me for I don't know how long and while I have you on the line, since it has been asked in other threads, did you consider buying the information from one of those mailing list/business lead providers that are obviously far less in cost (and also assuming less in quality) and if you did buy one, what was your experience with it?
tennis - the basic story is simple - we launched in Toronto over 18 months ago. As we mulled over launching in other cities [Canada and US], we didn't like some of the terms [eg revshare]. Coming from a tech background, and with a lot of experience in city databases etc, we decided it would be a better move for us to build our own database.
As we had already worked in the whitepages industry (again - the amount of data you can purchase will *stun* Average Joe) and thus it was easier for us to buy the data than someone starting new.
So - yes we did use one of the data brokers, and we ended up deciding it was a better move for us (background-considered) to do our own. Plus none were keen on the entire opening up/wiki style system :)
It would be an infringement against Superpages as well as against a number of their partners who provide them with data.
Further, it's a violation of their terms of use to make multiple, automated queries in that fashion, since that can impact performance for real users.
Superpages does have a partnership API, and affiliate program API that would allow you to redisplay some of their content on your website, if you qualify.
Eg mapping (NAVTEQ/Mapquest/Tele-atlas) - they may put a little dead end road somewhere that doesn't really exist. If they find that dead-end road appearing anywhere else - someone must have ripped them off.
Eg business data - you can slightly modify an address (223 Elm Street instead of the correct 222 Elm Street), or slightly modify the name, or even just put in an incorrect listing.
This happens for all commercial-grade 'data', whatever be it. Data sellers usually do customized poison-pills for every customer, making it damn easy to trace back.
I think it will be really really hard for superpages and local data brokers to find their poison pill because local is fragmented and 99.9% of local webmasters enjoy a nice -950 penalty and additionally 99% of local sites suffer the supplemental index syndrome due to the number of URLs reqired for getting traffic.
[edited by: SEOPTI at 10:36 pm (utc) on Oct. 9, 2007]