Forum Moderators: phranque
What is the right sytax - if there is one...
The dynamic pages are all in one directory. I still want it to work for users, but don't want Google to grab them. On the other hand, I want it to pick the page WITHOUT any parameters! So, the following is what I need...
OK for users and Google:
/mysite/dynamic/page.php
OK for users but NOT for Google
/mysite/dynamic/page.php?version=1
So how to do it?
Thanks for any help!
MC
Thanks for the advice - I can see how it can be really handy!
Problem is - I don't ACTUALLY have pages with a parameter - but I don't want them to be made into an error for users - in case someone still has them in bookmarks or something like that.
So my question still remains:
Does anyone know of the correct SYNTAX to keep Googlebot from spidering dynamic pages - using the robots.txt file?
Thanks
MC
[robotstxt.org...]
and if Google's description of their extensions is correct, then this should work -
User-agent: Googlebot
Allow: /mysite/dynamic/page.php$
Disallow: /mysite/dynamic
Per the working draft spec above, the robot should use the first matching pattern. If the URL has no parameters, then the first pattern should match, and the page will be read. If the URL contains parameters, then the first pattern will NOT match, but the second pattern (the disallow) will.
You will need to try this to see if it works, as it depends on Google supporting the later "working draft" spec, as well as supporting the "$" extension in an "Allow" line.
dosn't the disallow section will override the allow section in that working draft version? do you know?
Be sure to read Marcia`s WebmasterWorld Welcome and Guide to the Basics [webmasterworld.com] post.
Could you please explain what you are trying to achieve with this approach. AFAIK this won´t really help with query strings in the URL since a) when you reset the parameter Googlebot will have already requested that page and you cannot change the URL unless you return a 301 status code and give the new location in the Location header field and b) most scripts rely on the parameters to know which content to produce.
Andreas
I understood it that he didn't want google to supply any parameters to his scripts -- this is similar to a problem I ran into myself recently, where I basically clear parameters if the page is accessed without a valid referrer.
I'll be sure to read the _entire_ thread from now on before answering :)