disallow problem

Forum Moderators: goodroi

Message Too Old, No Replies

disallow problem

Autoweb

1:23 pm on Nov 2, 2005 (gmt 0)

I have hundreds of pages cached which I need to disallow. They are database details pages like the examples below:
car_detail.asp?cdID=346580
car_detail.asp?cdID=323480
car_detail.asp?cdID=342580
car_detail.asp?cdID=312380

How can I disallow them using the robots.txt file without listing them all (too many).

The eg below doesn't work

Disallow: /car_detail.asp

Help!

Richard

Lord Majestic

2:11 pm on Nov 2, 2005 (gmt 0)

What is the full path to the file car_detail.asp? If its in the root of your webserver then disallow directive is correct.

Autoweb

4:41 pm on Nov 2, 2005 (gmt 0)

It is in the root directory - but Google is caching all the individual URL's like shown above.

The disallow is obviously blocking
car_detail.asp

but not
car_detail.asp?cdID=312380

Lord Majestic

4:51 pm on Nov 2, 2005 (gmt 0)

You syntax is correct - the fact that Google's caches them may just indicate they have not taken into account your changes - its not immediate plus if URLs are already in the index then there is no obligation or guarantee that the search engine will remove them post factum.

Autoweb

4:55 pm on Nov 2, 2005 (gmt 0)

so in theory, should
disallow: /car_detail.asp

Remove all the
car_detail.asp?cdID=236237

References

Lord Majestic

5:17 pm on Nov 2, 2005 (gmt 0)

robots.txt is meant to be used to prevent crawling of disallowed urls - your statement is correct to disallow crawling of that file, however if the file was crawled before then its up to the search engine to decide to actually remove it from its database. It can take some time for your change in robots.txt to take effect as well - sometimes 24 hours, sometimes weeks - it all depends on search engine.

From robots.txt point of view you did everything correctly.

Autoweb

8:25 am on Nov 3, 2005 (gmt 0)

Thanks for the help.
It took a bit of sinking in, but I think I am on it now.