Forum Moderators: goodroi
I am about to launch a site with content from another site (I got written permission) that's intermingled with other pertinent content (think glossary).
Would it be in my best interest to disallow the robots to all those pages if 75% of the borrowed content is mixed with another 25% of different content? For example, can a spider pull out a handful of familiar sentences that match another site within a large paragraph as duplicate content?
If so, and you think it would be in my best interest to disallow those pages, do I need to include those pages in a Google sitemap?
Thanks,
Eric
So in either situation it won't be listed.
Nobody knows exactly what algorythms are used to determine duplicate contents apart from the people writing them :-)
I think the only thing to consider is if the site you are getting the information from is going to get the duplicate contents penalty instead of your site.
If you are going to deny a page in robots.txt your probably don't want to have it in a sitemap spefically for a search engine, but you probably do want it in a sitemap destined for browsers.