Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: goodroi
i want to know that there are some pages of my site which are crawled by google
and i have come to know about that now
some told me to block that pages using robots.txt and i have blocked them using robots but still after two weeks all that blocked pages were appearing in the google although there cache pages were not updating any more
but i want to totally remove them and to inform u that i cant no index on site as it is dynamic site and all pages follow one header
My question is will robots.txt helps in removing even they are crawled by google
and 2nd question is
let say i have these two urls in google
while the 2nd one is still in google but is blocked by robots.txt
will google still consider this as duplicate content or it will just ignore the 2nd url
Plz give me exact answer.
but that doesn't stop them listing the resource as a URL-only entry in the SERPs.
If they are no longer accessing it, then they will not consider the content in any way.
first of all let u know that i have blocked them already using commands
and when i check for these type of urls in google webmaster tools it is showing these urls blocked.
Now my question is even they are blocked by robots but still appearing in google with two urls
if first one is blocked is there any fear of penalty in future for duplicate content or duplicate url
Thanks for ur help
If they can't see the content, how can they know it is a copy of some content on another page?
They can't and don't. You are safe from Duplicate Content issues in that case.
Not all User-agents are wildcard aware, and the trailing * is not required.
This might be better, but you must also copy all of your other directives that you want Google to see into this rule block:
u have released my tension as many experts are net were not sure as there were two urls on google and both are still appearing on google and one of them which i mentioned above are blocked two weeks ago
And plz confirm me one more time as both of them are still in google
and are there any chances that they get removed in future.
ok now about code how can i use
bcoz i want yahoo and msn to follow the same rule so what should i do regarding this
and u mean i remove
which is at the end of jtype of my command , i get this command from this page
here it is written
To block access to all URLs containing the word "private", you could use:
and last but not the least question is can i use any command to block all the characters which are used after this url
and i also want to mention i used this forum very first time and it really helped me a lot
My best regards to u.
I don't know if Yahoo and MSN are wildcard aware. If they are not then showing them a directive with a * in it would not be a good idea.
the robots.txt urls cached pages are just turning to
We're sorry, but we could not process your request for the cache of http://example.com/demo.php?cat_id=ahhf&jtype=1. Please click here to check the current page or check for previous versions at the Internet Archive.
So what u say is it properly blocked by yahoo as well
and yes most of the pages are crawled with
as i have blocked them and now i want all search engines to crawl the urls without extra parameter
one of the example is
so can i submit a new sitemap with new requried urls or just wait for few weeks,months
as i dont know whether robots blocked urls will be removed from search engines or not.