Google scanned .txt files!

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google scanned .txt files!

How to avoid undesired scannings

specter

6:15 pm on Aug 1, 2007 (gmt 0)

Hello,

Google began to scan some text files in a subdirectory of my site.Is this normal?

Besides, how to prevent undesired scannings?

Say that I have some subdirectories I don't want Google visits, how could I get this?

Thanks!

tedster

9:05 pm on Aug 1, 2007 (gmt 0)

Use robots.txt to block directories and files you don't want spidered. Then, if those files were already indexed, you can request their removal through Google's url removal tool.

londrum

9:19 pm on Aug 1, 2007 (gmt 0)

stick this in your robots.txt, and it will stop googlebot spidering all pages with a .txt extension

User-agent: *
Disallow: /*.txt*$

it doesn't work for all bots though. but googlebot understands it.

specter

9:58 pm on Aug 1, 2007 (gmt 0)

Thanks for your useful replies!

netmeg

12:26 am on Aug 2, 2007 (gmt 0)

I discovered this happening a few years ago, and got around the problem by using other file extensions, like .lst or .meg or whatever.

vincevincevince

12:35 am on Aug 2, 2007 (gmt 0)

I discovered this happening a few years ago, and got around the problem by using other file extensions, like .lst or .meg or whatever.

That won't work. Do a search for 'filetype:lst' or 'filetype:meg' and you'll see plenty such files indexed.

The fact is that Google will try to put anything textual in finds into the index, so a robots.txt file or a server-side deny is the only way to stop this.