Forum Moderators: Robert Charlton & goodroi
I have files such as index.php, indexa.php, index01.php and so on. Although I don't link to the other files, do you think Google knows about them? If so, would Google give me a duplicate content penalty for them?
BTW I have not really discouraged any bots using robots.txt
Any site that throws up duplicate urls (and php scripts often do that by the million), should use robots.txt to avoid excessive duplication.
Your directory will do much better in Google if you eliminate all the senseless clutter that can accumulate - with careful use of robots.txt, you can often almost eliminate supplementary entries, which can bury a directory all too easily.
I used to use an 'off the shelf' directory script which produced 4 or 5 urls for each page, plus an extra page for each directory entry, one for a comment page for each entry, plus many more for 'user profiles' etc (most of which I had disabled). At one time, instead of the couple of hundred real pages (categories) Google thought I had 37000, and buried the 'real stuff' under supplementary listings.
And I've seen small directories with MILLIONS of pages.
Love your robots.txt, use it well - but don't expect quick changes. Many directory owners count their success by the number of pages counted by Google; WRONG! Directory success is counted by the number of non-spam sites listed - and the number of unique human visitors (especially returning visitors). ROI will surely follow them!