Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is it duplicate content if you don't link to it?

         

natural number

11:30 pm on Jul 13, 2006 (gmt 0)

10+ Year Member



I was looking around my web directory and realized I have 3 years of edits and re-edits.

I have files such as index.php, indexa.php, index01.php and so on. Although I don't link to the other files, do you think Google knows about them? If so, would Google give me a duplicate content penalty for them?

BTW I have not really discouraged any bots using robots.txt

Quadrille

8:08 am on Jul 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If Google spiders your pages, the Google knows.

Any site that throws up duplicate urls (and php scripts often do that by the million), should use robots.txt to avoid excessive duplication.

Your directory will do much better in Google if you eliminate all the senseless clutter that can accumulate - with careful use of robots.txt, you can often almost eliminate supplementary entries, which can bury a directory all too easily.

I used to use an 'off the shelf' directory script which produced 4 or 5 urls for each page, plus an extra page for each directory entry, one for a comment page for each entry, plus many more for 'user profiles' etc (most of which I had disabled). At one time, instead of the couple of hundred real pages (categories) Google thought I had 37000, and buried the 'real stuff' under supplementary listings.

And I've seen small directories with MILLIONS of pages.

Love your robots.txt, use it well - but don't expect quick changes. Many directory owners count their success by the number of pages counted by Google; WRONG! Directory success is counted by the number of non-spam sites listed - and the number of unique human visitors (especially returning visitors). ROI will surely follow them!

natural number

8:35 pm on Jul 15, 2006 (gmt 0)

10+ Year Member



Thanks Quadrille, I'm starting a clean-up. I am ashamed to say, I never really trusted myself with robots.txt; I don't want to accidently ban my whole site. Paranoid, I know.

SuddenlySara

8:55 pm on Jul 15, 2006 (gmt 0)



Not only duplicate but this can be looked at as a "doorway" according to google. Lately the google bots have been exploring everything on our servers. Get rid of your orphan files that are not linked within your website. I sure figured this out when doing site: searchme and finding files 6 years old and more showing up now.