I have hundreds of pages cached which I need to disallow. They are database details pages like the examples below: car_detail.asp?cdID=346580 car_detail.asp?cdID=323480 car_detail.asp?cdID=342580 car_detail.asp?cdID=312380
How can I disallow them using the robots.txt file without listing them all (too many).
The eg below doesn't work
Disallow: /car_detail.asp
Help!
Richard
Lord Majestic
2:11 pm on Nov 2, 2005 (gmt 0)
What is the full path to the file car_detail.asp? If its in the root of your webserver then disallow directive is correct.
Autoweb
4:41 pm on Nov 2, 2005 (gmt 0)
It is in the root directory - but Google is caching all the individual URL's like shown above.
The disallow is obviously blocking car_detail.asp
but not car_detail.asp?cdID=312380
Lord Majestic
4:51 pm on Nov 2, 2005 (gmt 0)
You syntax is correct - the fact that Google's caches them may just indicate they have not taken into account your changes - its not immediate plus if URLs are already in the index then there is no obligation or guarantee that the search engine will remove them post factum.
Autoweb
4:55 pm on Nov 2, 2005 (gmt 0)
so in theory, should disallow: /car_detail.asp
Remove all the car_detail.asp?cdID=236237
References
Lord Majestic
5:17 pm on Nov 2, 2005 (gmt 0)
robots.txt is meant to be used to prevent crawling of disallowed urls - your statement is correct to disallow crawling of that file, however if the file was crawled before then its up to the search engine to decide to actually remove it from its database. It can take some time for your change in robots.txt to take effect as well - sometimes 24 hours, sometimes weeks - it all depends on search engine.
From robots.txt point of view you did everything correctly.
Autoweb
8:25 am on Nov 3, 2005 (gmt 0)
Thanks for the help. It took a bit of sinking in, but I think I am on it now.