Keep in mind that removing URLs manually from the index is not permanent if there are pages left with links to these deleted pages. The deleted pages will be reindexed again.
The meta tag 'noindex' is a very good solution. It won't hurt you and the pages won't be reindexed. You just need to be patient while Google reprocesses your pages.
In fact, you may want to use 'noindex,follow' to avoid loosing page rank. It is even better.
You talking Wordpress I presume. I had the same issue on a site that had 500 posts, so around 2000 images and they all had their own attachment pages.
There is a well known seo plugin that you can tell it to 301 the attachment pages to the post that they are attached to.
It is taking months but I tend to see a few hundred per week drop out of WMT. It is pretty gradual so I would not worry, can only help IMO from a thin content point of view..
Is this the only type of file that uses the attachment_id parameter? If so, you've got one more option. In the parameters area of wmt, you can tell it to ignore any URL containing a specified parameter. That is, don't just ignore the parameter (the more common approach) but the page itself.
Setting parameter ignore in GWT is an option, but then you would have to configure this in all search engine webmaster tools.
Moreover, with 'noindex,follow', if later you want to index a couple of pages again, you just have to remove it on those pages. It is more flexible than GWT configuration.
To avoid to set "noindex, follow" to all those images urls (dozens), that could be a very boring activity, I have just found and set "Disallow: /*?" in robots.txt. In this way I'm try to exclude all those pages in the next indexing. What do you think?
This not removes dozens of urls from Google index however soon, maybe in months as it has happened to @chalkywhite
Disallowing does not stop indexing, so it's really not a solution for getting them out of the index -- What I really don't understand is if they're simply "not useful" [meaning they're also not doing harm and likely don't rank for anything "meaningful"] why even worry about it or spend time on them rather than just leaving them alone and letting the search engines decide how to handle them?
@JD_Toims Disallowing is useful to not more indexing (effect similar to add "noindex, follow"), but I need to keep them out from the index (and I'm looking to find a way to do it).
I have began to believe this is a need after talking with Google Webmaster support forum, in which one assistant pointed out about useful content. Websites with not useful content should rank in minor positions. These urls are not useful because they contain only images.
|@JD_Toims Disallowing is useful to not more indexing (effect similar to add "noindex, follow")... |
Disallowing will not stop further indexing if Google decides that it should index that page regardless (e.g. there are links to that page). It *may* stop Google indexing disallowed pages where there are no links to them - but it is not guaranteed.
Also, the effect is not similar as noindex, follow as pages that are disallowed will not circulate Page Rank juice within the site, whilst noindexed pages will.
|@JD_Toims Disallowing is useful to not more indexing (effect similar to add "noindex, follow"), but I need to keep them out from the index (and I'm looking to find a way to do it). |
No, it really isn't. Google is notorious for indexing disallowed URLs when they find links to those URLs anywhere on the Internet -- A disallow and noindex are two totally different things and disallowing is not an effective means of keeping Google from indexing URLs.
I don't have time to track down dozens of threads for verification of my point, but here's one of the most recent: [webmasterworld.com...]
You'll have to canonicalize/301 to an essentially the same page, noindex or serve an error when gBot requests them to get them out of the index and to ensure no other URLs are indexed in the future.
[edited by: JD_Toims at 8:27 pm (utc) on Nov 7, 2013]
Sure. It's not very similar.
These pages are made by only one image contained in the template of the site. This is a particular result of wordrpress which gives also a permalink to images. I don't think these pages pass any juice and, at the same time, these pages are not linked from nowhere. They have been indexed because of the old version of sitemap guided the crawler there.
I'd like to deindex them, even if you @aakk9999 (thanks for your support) are saying that my attempts could be ineffective...
If there are links to these urls, and the links are not malignant, then you should claim these urls via 301.
N.B. I am definitely NOT advocating 'catch-all' 301 redirects.
The rule I follow is this:
- if you are prepared to take responsibility for a url, 301 it to another page
- if you are not (or do not recognise it) then 404
|This is a particular result of wordrpress which gives also a permalink to images. |
these pages are not linked from nowhere.
You can't have it both ways.
|I don't think these pages pass any juice... |
All links that are not nofollowed pass at least some weight.
Google is going so far as to use "mentions", meaning unlinked URLs, for discovery.
Added: There's currently a question/speculation circulating that some nofollow links may continue to pass weight.