Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google scanned .txt files!

How to avoid undesired scannings

         

specter

6:15 pm on Aug 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello,

Google began to scan some text files in a subdirectory of my site.Is this normal?

Besides, how to prevent undesired scannings?

Say that I have some subdirectories I don't want Google visits, how could I get this?

Thanks!

tedster

9:05 pm on Aug 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Use robots.txt to block directories and files you don't want spidered. Then, if those files were already indexed, you can request their removal through Google's url removal tool.

londrum

9:19 pm on Aug 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



stick this in your robots.txt, and it will stop googlebot spidering all pages with a .txt extension

User-agent: *
Disallow: /*.txt*$

it doesn't work for all bots though. but googlebot understands it.

specter

9:58 pm on Aug 1, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for your useful replies!

netmeg

12:26 am on Aug 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I discovered this happening a few years ago, and got around the problem by using other file extensions, like .lst or .meg or whatever.

vincevincevince

12:35 am on Aug 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I discovered this happening a few years ago, and got around the problem by using other file extensions, like .lst or .meg or whatever.

That won't work. Do a search for 'filetype:lst' or 'filetype:meg' and you'll see plenty such files indexed.

The fact is that Google will try to put anything textual in finds into the index, so a robots.txt file or a server-side deny is the only way to stop this.