I'm in the process of developing a site that will probably have millions of pages, thanks to APIs with millions of entities. Most definitely about 99% of the content (companies/services/products/places) will already be published somewhere on the web. Will I run into massive duplicate content issues?
Even if I combine the content from different sources so it won't look like an exact 1:1 copy?
Now just concerning the business listings and from a search engine view, the site drills down like:
homepage->category->business (tens or hundreds of thousands businesses with huge pagination in this case)
and of course there's a search form, too.
I assume I should use noindex,follow for the second one?
Should I list all its categories on the business listings page? If so, should I also link them back to the categories? Not sure about the link juice here.
[edited by: tedster at 6:06 pm (utc) on Feb 7, 2012]
[edit reason] moved from another location [/edit]