Robot.txt and Sitemap

Forum Moderators: goodroi

Message Too Old, No Replies

Robot.txt and Sitemap

sierra11b

7:10 pm on Oct 4, 2005 (gmt 0)

I've never attempted a robot.txt file and am wondering if i should. Bear with me if things don't make sense to you. It may be because I do not understand their use 100%.

I am about to launch a site with content from another site (I got written permission) that's intermingled with other pertinent content (think glossary).

Would it be in my best interest to disallow the robots to all those pages if 75% of the borrowed content is mixed with another 25% of different content? For example, can a spider pull out a handful of familiar sentences that match another site within a large paragraph as duplicate content?

If so, and you think it would be in my best interest to disallow those pages, do I need to include those pages in a Google sitemap?

Thanks,

Eric

Dijkgraaf

11:41 pm on Oct 4, 2005 (gmt 0)

Possibly the page gets a duplicate contents penalty, and then it won't be listed in the search engine
If you excluded it via robots.txt then it won't be listed in the search engine.

So in either situation it won't be listed.
Nobody knows exactly what algorythms are used to determine duplicate contents apart from the people writing them :-)
I think the only thing to consider is if the site you are getting the information from is going to get the duplicate contents penalty instead of your site.

If you are going to deny a page in robots.txt your probably don't want to have it in a sitemap spefically for a search engine, but you probably do want it in a sitemap destined for browsers.

sierra11b

1:26 am on Oct 5, 2005 (gmt 0)

So it's basically a risk. *sigh*

I guess my next question would be if there's known examples of duplicate content from the past?