| 10:31 am on Sep 13, 2012 (gmt 0)|
Nope. Why do you want to do this? Across over fifty WP sites, I've never seen this be an issue.
| 10:40 am on Sep 13, 2012 (gmt 0)|
But most of those administrative links including edit comments are enabled or presented to you only if you are logged in. So why bother?
Even if the bots somehow manage to trace them, most wordpress webmasters stop the bots using robots.txt and the requirement to login is only an extra step to prevent those nasty not so obedient bots.
I even go further to restrict access to wp-admin folder to my ip address. So the bots or users would never be presented with a login page and a 403 is issued.
| 11:58 am on Sep 13, 2012 (gmt 0)|
|Why do you want to do this? |
|But most of those administrative links including edit comments are enabled or presented to you only if you are logged in. So why bother? |
That's what I thought, nowhere are links to the edit pages visible to anyone but me, until...
|site:example.com example.com |
On the last page of google results, inside the duplicate results section, is this result(and 300 more like it)...
A description for this result is not available because of this site's robots.txt – learn more
...and it got there because the wp-admin section was blocked by the default robots.txt, "fine, can't crawl it so index the url". In fixing the issue I'd prefer google(and anyone not logged in)get a 404 error instead when loading these urls.
Helpful related link, this issue has been brought up before but no solution is posted for the current version of wordpress: [yoast.com...]
Also, the solution suggested involves sending a X-Robots-Tag which is not possible with all hosts.
| 12:21 pm on Sep 13, 2012 (gmt 0)|
One potential workaround:
On my WP sites, I upload an htacess file to my wp-admin directories that blocks access (forbidden) to anyone who is coming from an IP other than my own. I do it for security reasons, but something like that might have the side effect of doing what you want to do (keeping Google out).
| 1:08 pm on Sep 13, 2012 (gmt 0)|
|I'd prefer google (and anyone not logged in) get a 404 error |
Please note - as mentioned by indyank above, one proper http status code might be be "403 Forbidden", but a "404 Not Found" status isn't accurate. The most accurate error code would probably be "401 Unauthorized".
| 1:39 pm on Sep 13, 2012 (gmt 0)|
I have never come across that so far and i do check all the supplementary index from time to time. I am not seeing it on my sites so far. I do block wp-admin with robots.txt in addition to restricting access to it to my ip alone.
But I do see some tag pages which I never link to on my sites in the supplementary index. These have noindex meta tag on them and are also blocked by robtos.txt. I am planning to remove the robots.txt blcok so their bots see the noindex meta tag.
| 1:40 pm on Sep 13, 2012 (gmt 0)|
I must add that their crawl behavior is very creepy and suspectful since 2011. They probably are using some unannounced disguised bots that don't obey or play mischief with robots.txt often.
| 2:05 pm on Sep 13, 2012 (gmt 0)|
Honestly, I don't think it's an issue.
| 5:31 pm on Sep 19, 2012 (gmt 0)|
After fixing the problem by implementing a 403 error before any redirect(sept 14th) the "not selected" number of pages has dropped by 98% inside my GWT "index status" report. In my particular case the edit comment redirects were all indexed and treated as urls without descriptions because they were blocked by robots.txt.
unblocking them in robots.txt might have also worked, i.e. removing the disallow /wp-admin/
I won't post my solution here because it involved core changes to wordpress and I haven't worked out an equivalent function to do the same just yet.
I strongly suggest you do the following google search - "site:example.com example.com" and check the supplemental index for robots.txt blocked pages.
| 7:15 pm on Sep 19, 2012 (gmt 0)|
I'd say it's probably better to remove redirects: you have a crawling budget and the redirect URLs will have to be counted against it, together with "good" URLs. Even though Google will (may?) realize the URLs are bad for indexing, they still need to crawl them (or attempt to crawl and get a 403). That just leaves less of the crawl budget for the stuff you actually want them to see.
P.S. Just re-read Sgt's last post: 403 *before* redirect would take care of that, this is how I'd also do it. Did you do JS redirect or meta refresh? My guess is that you are no longer able to administrate the blog using IE 'cause it [used to] show its own 403 error page instead of what you sent after a 403 header? Sorry, it's been years since I used IE for anything [important] ...
| 3:03 am on Sep 20, 2012 (gmt 0)|
I don't use IE to administer the blog, I always preferred a non-search engine backed browser to do that for privacy reasons, not that it helped at all.
I used the .htaccess file along with some changes in a wordpress core file to do the trick because JS can be disabled and meta refresh can be a problem too. I am leaning towards Netmegs opinion that it probably has little ranking impact, Google knows and expects some things like this, but I do feel some relief in not seeing so many *not included* urls, and non-existant urls in general, in my GWT report.
| 4:01 am on Sep 20, 2012 (gmt 0)|
|Google knows and expects some things like this |
Yes - especially for widely used content management systems like Wordpress.
If your Wordpress installation is highly customized, you may continue to need to do your own further customization. That said, there are also some excellent plug-ins available that can take care of many SEO issues like this. I've used them on several large scale international projects. The client and I were highly pleased with the results in each case.
| 3:30 pm on Sep 20, 2012 (gmt 0)|
Argh, I hate seeing people mess with core WordPress files. Are you going to have to keep making this change every time there's an update?
Seriously, dude. I have over fifty WP sites in my purview. It's really really not an issue.
| 3:45 pm on Sep 20, 2012 (gmt 0)|
I do agree it is not an issue but id does give a sense of satisfaction when you don't see all those errors on your dashboard.
403 isn't really an error. It is just an html response.As I said before, all you had to do is to restrict access to wp-admin folder to your ip address. It doesn't need any alteration to the core wordpress files.
You just had to add this to .htaccess in your wp-admin folder.
Deny from all
Allow from <your ip address>