How do I block GoogleBot from Dynamic Content

Forum Moderators: open

Message Too Old, No Replies

How do I block GoogleBot from Dynamic Content

Lisa

9:56 pm on Jul 22, 2002 (gmt 0)

How do I block GoogleBot from Dynamic Content? If you have your own search engine and your results are at "/?q=something" How do I block Googlebot from crawling anything with a ? in it. I know that it will not crawl that stuff for low PR sites but I fear that if my PR improves I will have Google following into the SERPs. I had to stop that nasty ai_archiver altogether because it started following the dynamic content.

I guess it would be an easy question if the default page were not also the interface to the search results. “/?”

Would I use:

User-agent: *
Disallow: /?

jdMorgan

10:23 pm on Jul 22, 2002 (gmt 0)

Lisa,

That's a nasty problem! I don't know if there any "rules" for robots pertaining to handling
strings passed to scripts. It wouldn't surprise me if different robots treated them
differently, either. Since the "?" is technically not part of a URL, they may not even
process it in the URL compare. So "/" and "/?xxxx" may be the same as far as interpreting
robots.txt is concerned...

I know you don't want to hear this, but is there any way you could "move" the query URL to
something like "/search?q=xxx" That would make handling this problem in robots.txt and/or
by using .htaccess, etc. a lot easier.

Good luck,
Jim

Lisa

6:42 am on Jul 24, 2002 (gmt 0)

UPDATE:

Googlebot will obey

User-agent: *
Disallow: /?

Whew! That is good news!

However inktomi doesn't obey... So inside my robots.txt I will screen for the googlebot agent and only give them the do not follow dynamic code tags.

Sinner_G

7:16 am on Jul 24, 2002 (gmt 0)

Will Googlebot always look for the robots.txt, even if it is linked from another site directly to a dynamic page (i.e. www.somesite.com links to www.anothersite.com/pagename.asp?var=5683)?

Technonotice

1:48 pm on Jul 27, 2002 (gmt 0)

Before crawling a domain, Google will check the robots.txt file in its database (or download it if it's a new URL).

Technonotice