Welcome to WebmasterWorld Guest from 188.8.131.52
I have redirected (301) all those pages to newer pages (completely different url's), with different, newer content, hoping to keep any bwlinks of the old pages to the newer pages.
However, even though google followed the redirect to the newer pages as i can see from my logs file (followed 3 and 4 months ago), still google holds an old copy of about 80,000 pages- none were updated, even though google took it. These copies are 8 months old!
Does anyone know how to get around it? Is there any way to have these old pages completely booted out of google?
You can check if Google will delete your pages soon:
There is a recent thread about Google taking measures against scraper sites at [webmasterworld.com...] If the assumption is correct and Google is currently experimenting with the optimal solution to wipe out scraper sites, they might have suspended deletion of files from their index, because the more garbage is in the index, the better they can test if the algorithm is working correctly.
any idea how long would it take to get these pages off google?
macdave: robots.txt has been disallowing those pages for 5 months now. Google just doesn't take it off.
I moved content from one domain to the other. I could never get the same content in the new domain to rank. But it ranked in the old domain as a " supplemental result " . PR was still assigned to pages that were not there and the content showed in google cache for the old content. There was entirely new content in the old domain but the swop was done overnight.
Have you moved the content, is there some reason google would flag that content " set "? It is in some way not clean for google, perhaps somebody else has copied it ..have you done a foot print search selecting chunks of text in "" to see if it resides somewhere else on the web.
Hope this helps.
ps. i have not bothered to address the issue as the domains are not important to me. I am going to this week and i think that changing the content in the new domain ( away from that of the " set " that was transferred from the old one ( now showing as supplemental ) may help
1. Put the URLs you want to have removed into robots.txt. (you've already done this)
2. Submit your robots.txt to the Remove URL tool: [services.google.com:8882...]
3. Have a beer or two -- in 24-36 hours or so those URLs will be out of the index.
marval - The remove tool doesn't need to see the 301 in order to remove the URL. It just looks at robots.txt and matches it against files in the index. When using robots.txt the remove tool doesn't do any spidering (except for grabbing robots.txt).
Maybe if you could post an example of what you mean by putting the urls in the robots.txt file I can try a test to see if I can get that to work
If you 301 a URL, the change will will be picked up by Googlebot in the course of its normal crawling. Once it's seen the 301, it will take a day or two for the URL to drop out of the index, and possibly longer for Googlebot to come back and index the new URL. In flex55's case, robots.txt is preventing Googlebot from crawling those URLs to even see the redirects, so the URLs are staying in the index.
There are 3 ways to use the Remove URL tool:
1) "Remove pages, subdirectories or images using a robots.txt file." Set up your robots.txt to disallow the URLs you want to remove, then tell the Remove URL tool to read your robots.txt. This is by far the easiest way to remove large number of URLs from the index.
2) "Remove a single page using meta tags." Add meta "noindex" to individual pages and submit those URLs to the Remove URL tool one-by-one. This works well, but can be tedious if you have more than a few URLs, because you must add meta tags to each page and submit each individually.
3) "Remove an outdated link." Submit individual URLs that return a 404 status code. Easier than 2, but you can't provide redirects for your visitors. The page must really be gone.
Methods 2 and 3 retrive the URLs you feed them and require specific responses in order to operate. Neither understands redirects of any kind, so it's no surprise that you'd get an error. But using method 1 (robots.txt), the tool doesn't even try to spider your site, so it doesn't matter how or if you've redirected a URL. The tool just looks at robots.txt for a list of URLs to remove, and matches that against what's already in the index.
Once it's seen the 301, it will take a day or two for the URL to drop out of the index, and possibly longer for Googlebot to come back and index the new URL
It is amazing how they managed to double their index size a few months ago isn't it?!
n.b. I'm referring to 301s within a single site. Google may treat cross-domain 301s differently.