Forum Moderators: phranque

Message Too Old, No Replies

bots cashing folders outside of domain structure

bots cashing folders outside of domain structure

         

c0c0c0

10:38 pm on Nov 25, 2004 (gmt 0)

10+ Year Member



Good (Insert part of day here) ladies and gentlemen.

I present to you a problem which is probably easy to fix, but is at the moment after doing searching on a few forums I still have not had a solution to. The problem is perplexing to me as well.

I have a site: mywidgets.com I installed phpnuke through cpanel which wanted to install in a directory. I put this in /index I have a redirect which will go into /index.

I have been reading seo for dummies (because…) Anyway, there was a tip on there to see what google had indexed (site:mywidgets.com -pppppp) and noticed the perplexing and shocking part.

Google had indexed and cashed all of the other directories in /public_html including myotherdomain.com which is in /carrot (parking took care of this and it points to /carrot as the root directory) and some other folders which I do not want any bot good or bad indexing or associating it's self with mywidgets.com

I still want to access these folders, by going to them and don't want it to break anything affecting myotherdomain.com

Do I do this with .htaccess? If so how? If not, then what have I done wrong?

Thanks for reading and replying.

sonjay

4:24 am on Nov 26, 2004 (gmt 0)

10+ Year Member



Anything inside your /public_html folder is part of your domain structure, and is fair game to Google and the other search engines unless you tell them not to index it.

How do you that? Use a robots.txt [robotstxt.org] file.

Here's a nice easy tutorial [searchengineworld.com].

c0c0c0

4:49 am on Nov 26, 2004 (gmt 0)

10+ Year Member



Is there a way you can do an allow only /index?

I don't want to try to remember to modify robots when I have to add or change a folder.

sonjay

5:28 am on Nov 26, 2004 (gmt 0)

10+ Year Member



Not with robots.txt, no. You can only disallow specific directories or files, or disallow everything by saying Disallow: / (meaning disallow everything within the site root).

Do you really not want anything in your site indexed by Google except the home page?

There's probably some method using mod_rewrite in your .htaccess file to accomplish this, but I'm no mod_rewrite expert. Someone else would have to help you with that, if that's what you want.

If you use templates in your site, you could put the per-page "no-index no-follow" robots instruction in the template(s), and then all new pages based on the template(s) be no-indexed no-followed. You wouldn't have to remember to change robots.txt if you go that route.

Or maybe there's some perfect method out there that I'm not thinking of. Maybe someone else will come along and suggest it.

c0c0c0

5:37 am on Nov 26, 2004 (gmt 0)

10+ Year Member



I want everything in /index and it's sub directories indexed.

Since I have /carrot for my other site, I don't want Google or other bots indexing anything in that folder or subfolders because then it will look like I am cloning...

Also I have some personal and work related stuff in the other directories which I don't want to be searchable.