Forum Moderators: Robert Charlton & goodroi
i was doing just test on my site and have created another folder with same content and unforunately run a sitemap for that
i remove that sitemap after few hours but now when i searching in google i can see that google has crawled that pages as well, i dont know did google gives any penalty for it or it ignores it
let say i have two folders one is
example.com/articles
and test was
example.com/posts
i have changed post folder and all files under it to 404 will google penalized me for this or what
and what can i do to remove it quickly from google.
may i have to worry about it or not will it effect my site ranking>>?
[edited by: Robert_Charlton at 9:03 pm (utc) on Sep. 28, 2008]
[edit reason] changed to example.com; it can never be owned [/edit]
I'd suggest a robots.txt disallow rule for that accidental folder now, so that Google stops spending crawl budget on urls you don't want in the index - and in fact no longer have on the domain.
Remove all files and subdirectories in a specific directory on your site from appearing in Google search results."
but it gives error of denied, r u sure adding this directory in robots will help in removing urls using webmaster tools
www.example.com/messages
so i added this in robots.txt file
User-Agent: Googlebot
Disallow: /messages
is this ok
and 2nd step i should have to request in google tools like this
Remove all files and subdirectories in a specific directory on your site from appearing in Google search results."
Directory URL: http://www.example.com/messages
all pages giving 404 as well so have i done every thing ok
My best regards
i have another question if u can plz help with that as well
my site is crawled with domain www and without www
i have added 301 permanent redirect google is removing with www urls but every week it is removing 100 to 300 urls and there are 3000 still left is there any way to remove them fastly
and 2nd question i have site crawled with double links
let say one is
www.example.com/demo/articles&jtype=1
and 2nd is
www.example.com/demo/articles
i have added redirect to remove jtype but it is too slow and taking too long to remove jtype urls
Any quick way to handle both the problems.
Likewise the non-www URLs will still send traffic where they appear in the SERPs and the redirect forces the correct URL. I would let Google remove them in their own time. It would be silly to remove the non-www when there isn't a full complement of www URLs in the SERPs to send that traffic through.
the real problem occured now
i have some files which are not in a directory form as im using dynamic script
example of url is like this
www.example.com/demo/top_emailed_jokes.php?cat_id=110&jtype=emailed
www.example.com/demo/top_ten_jokes.php?cat_id=46&jtype=ten
i have added urls in google webmaster tools like this
www.example.com/demo/top_emailed_jokes.php/
www.example.com/demo/top_ten_jokes.php/
but it removed only one url which i submitted and leave the other urls as it which i mentioned above so what to do to remove all urls .
[edited by: Receptional_Andy at 8:39 am (utc) on Oct. 3, 2008]
[edit reason] Please use example.com - it can never be owned [/edit]
What more else is there to say? Just wait for re-crawl and it'll fix it self.
Fsmobilez - Return a 404 error to the PHP parameters you don't want indexed.
A re-crawl and fix isn't going to happen over night but in the future, for any test websites etc you're working on an important tag is;
<meta name="robots" content="noindex">
If the 404 doesn't work, which it will. Place that code on your checkout.php in the header tags. That'd work 100% if all else fails.
If that crawl budget gets used in spidering essentially unimportant urls (and a pile of 404s for accidental urls is pretty unimportant) then less budget is there to crawl the rest of the site, more frequently and deeper.
This is also a reason why server response time, though probably not directly part of the ranking algorithm, can affect a website's performance in Google Search. It's also a reason why responding to the If-Modified-Since header with 304's when appropriate matters, as well as using file compression, such as mod_gzip on Apache.
So in this particular case, I recommended the robots.txt disallow rule. Once spidered, that will stop googlebot from spending any more cycles on trying to get those accidentally exposed urls.
My point, esepcially for this thread, is that each site does have a crawl budget. It is being adjusted continually for all kinds of reasons (including Google's internal needs) - but whatever your site's allotted crawl budget is in any cycle, you don't want to squander it.
this time the question asked was not about accidental pages , accidental pages have been removed by the method u told me.
It was for another site so instead of starting a new thread i carry on with it.
first of all to ur answer about robots i have tried it already but after one month still urls were not removed , although there cache pages were removed but urls were still in google and on another forum some one told me that google will keep these urls with it forever till i remove disallow from robots and change them to 404 error.
so i changed the pages to 404
u told me to use robots and quick removal tool in WMT which was very effective
i wonder if i can do for this dynamic site as well to remove the pages quickly
i have some files which are not in a directory form as im using dynamic script
example of url is like this
www.example.com/demo/top_emailed_jokes.php?cat_id=110&jtype=emailed
www.example.com/demo/top_ten_jokes.php?cat_id=46&jtype=ten
i have added urls in google webmaster tools like this
http://www.example.com/sms/top_emailed_jokes.php/
www.example.com/demo/top_ten_jokes.php/
but it removed only one url which i submitted and leave the other urls as it which i mentioned above so what to do to remove all urls
[edited by: tedster at 5:03 pm (utc) on Oct. 3, 2008]
[edit reason] switched to example.com [/edit]