Forum Moderators: open

Message Too Old, No Replies

GoogleBot finding scripts i'd rather it didn't

How do you stop GoogleBot from finding scripts?

         

inbound

2:15 am on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm,

Just noticed Google added several "pages" from one of my sites that it shouldn't have found.

It's gone and found some php scripts that are only triggered by a form which asks for location information (which then sends the visitor to the local site for the service). This means that Google can (and probably does) follow every link that you have in forms (and it constructed the links itself too as they are not complete in the form).

Also this method of google finding links does not migrate pagerank at all (certain of this).

I'm sure many people will know about this already and will have found some benefit from this but I'd just like to get back to having the real pages on the site listed so does anyone know a simple fix so Google can't follow urls it constructs from forms?

I've had a look at quite a few sites and can't see the same issue, could this be new?

Thanks

keywordguru

1:53 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



htaccess should solve this quickly. Just input what you don't want indexed and in theory, it shouldn't.

Remember, not all spiders abide by the .htaccess yet google bot usually does
KG

DanA

2:22 pm on Jan 3, 2005 (gmt 0)

10+ Year Member



You can also use robots.txt(for robots) rather than .htaccess(for the Apache server that will block any bot)
In robots.txt you can disallow the page with the form. Google will obey (after a few weeks) the rule, but many other bots won't.

inbound

2:38 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Duh,

Thanks for that, I can't believe I didnt immediately think of that. Sometimes you miss the obvious.

DerekH

2:53 pm on Jan 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Remember, not all spiders abide by the .htaccess yet google bot usually does

robots.txt is a "voluntary code of conduct" that spiders usually chose to adhere to. Human visitors aren't even aware of such files.

htaccess controls what Apache will grant access to. It makes no difference whether the visitor is a spider or a human visitor - if htaccess blocks it, you can't have it. No question of "abiding by it".

DerekH