Forum Moderators: goodroi
i want to know that there are some pages of my site which are crawled by google
and i have come to know about that now
some told me to block that pages using robots.txt and i have blocked them using robots but still after two weeks all that blocked pages were appearing in the google although there cache pages were not updating any more
but i want to totally remove them and to inform u that i cant no index on site as it is dynamic site and all pages follow one header
My question is will robots.txt helps in removing even they are crawled by google
and 2nd question is
let say i have these two urls in google
www.example.com/demo/jokes_category.php?cat_id=78
www.example.com/demo/jokes_category.php?cat_id=78&jtype=1
while the 2nd one is still in google but is blocked by robots.txt
will google still consider this as duplicate content or it will just ignore the 2nd url
Plz give me exact answer.
first of all let u know that i have blocked them already using commands
User-Agent: *
Disallow: /*jtype*
and when i check for these type of urls in google webmaster tools it is showing these urls blocked.
e.g
www.example.com/demo/jokes_category.php?cat_id=78&jtype=1
Now my question is even they are blocked by robots but still appearing in google with two urls
www.example.com/demo/jokes_category.php?cat_id=78&jtype=1
and
www.example.com/demo/jokes_category.php?cat_id=78
if first one is blocked is there any fear of penalty in future for duplicate content or duplicate url
Thanks for ur help
u have released my tension as many experts are net were not sure as there were two urls on google and both are still appearing on google and one of them which i mentioned above are blocked two weeks ago
And plz confirm me one more time as both of them are still in google
www.example.com/demo/jokes_category.php?cat_id=78&jtype=1
and
www.example.com/demo/jokes_category.php?cat_id=78
and are there any chances that they get removed in future.
ok now about code how can i use
User-Agent: Googlebot
bcoz i want yahoo and msn to follow the same rule so what should i do regarding this
and u mean i remove
*
which is at the end of jtype of my command , i get this command from this page
[google.com...]
here it is written
To block access to all URLs containing the word "private", you could use:
User-agent: *
Disallow: /*private*
and last but not the least question is can i use any command to block all the characters which are used after this url
www.example.com/demo/jokes_category.php?cat_id=78
means
www.example.com/demo/jokes_category.php?cat_id=78blahblahblah
and i also want to mention i used this forum very first time and it really helped me a lot
My best regards to u.
the robots.txt urls cached pages are just turning to
We're sorry, but we could not process your request for the cache of http://example.com/demo.php?cat_id=ahhf&jtype=1. Please click here to check the current page or check for previous versions at the Internet Archive.
So what u say is it properly blocked by yahoo as well
and yes most of the pages are crawled with
jtype=1
example below
www.example.com/demo/jokes_category.php?cat_id=78&jtype=1
as i have blocked them and now i want all search engines to crawl the urls without extra parameter
one of the example is
www.example.com/demo/jokes_category.php?cat_id=78
so can i submit a new sitemap with new requried urls or just wait for few weeks,months
as i dont know whether robots blocked urls will be removed from search engines or not.