homepage Welcome to WebmasterWorld Guest from 54.196.225.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
A wordpress based SEO question involving redirects
Sgt_Kickaxe




msg:4494522
 12:21 am on Sep 13, 2012 (gmt 0)

My wordpress based site is "helpful" in that it redirects me to a login page every time I click on a url or feature that requires my being logged in as administrator. This includes individual comment moderation urls such as

example.com/wp-admin/comment.php?action=editcomment&c=12345

I want to stop the helpful redirects completely. If I'm not logged in I want a proper 404 error so that search bots also receive the same. I've asked how to best do this over on the wordpress forums however it seems to stump everyone who responds.

Has anyone done this already? can you provide instructions?

 

netmeg




msg:4494678
 10:31 am on Sep 13, 2012 (gmt 0)

Nope. Why do you want to do this? Across over fifty WP sites, I've never seen this be an issue.

indyank




msg:4494679
 10:40 am on Sep 13, 2012 (gmt 0)

But most of those administrative links including edit comments are enabled or presented to you only if you are logged in. So why bother?

Even if the bots somehow manage to trace them, most wordpress webmasters stop the bots using robots.txt and the requirement to login is only an extra step to prevent those nasty not so obedient bots.

I even go further to restrict access to wp-admin folder to my ip address. So the bots or users would never be presented with a login page and a 403 is issued.

Sgt_Kickaxe




msg:4494712
 11:58 am on Sep 13, 2012 (gmt 0)

Why do you want to do this?

But most of those administrative links including edit comments are enabled or presented to you only if you are logged in. So why bother?


That's what I thought, nowhere are links to the edit pages visible to anyone but me, until...

site:example.com example.com

On the last page of google results, inside the duplicate results section, is this result(and 300 more like it)...
http://www.example.com/wp-admin/comment.php?action=editcomment&c=123
A description for this result is not available because of this site's robots.txt learn more

...and it got there because the wp-admin section was blocked by the default robots.txt, "fine, can't crawl it so index the url". In fixing the issue I'd prefer google(and anyone not logged in)get a 404 error instead when loading these urls.

Helpful related link, this issue has been brought up before but no solution is posted for the current version of wordpress: [yoast.com...]

Also, the solution suggested involves sending a X-Robots-Tag which is not possible with all hosts.

Sand




msg:4494716
 12:21 pm on Sep 13, 2012 (gmt 0)

One potential workaround:

On my WP sites, I upload an htacess file to my wp-admin directories that blocks access (forbidden) to anyone who is coming from an IP other than my own. I do it for security reasons, but something like that might have the side effect of doing what you want to do (keeping Google out).

tedster




msg:4494739
 1:08 pm on Sep 13, 2012 (gmt 0)

I'd prefer google (and anyone not logged in) get a 404 error

Please note - as mentioned by indyank above, one proper http status code might be be "403 Forbidden", but a "404 Not Found" status isn't accurate. The most accurate error code would probably be "401 Unauthorized".

indyank




msg:4494744
 1:39 pm on Sep 13, 2012 (gmt 0)

I have never come across that so far and i do check all the supplementary index from time to time. I am not seeing it on my sites so far. I do block wp-admin with robots.txt in addition to restricting access to it to my ip alone.

But I do see some tag pages which I never link to on my sites in the supplementary index. These have noindex meta tag on them and are also blocked by robtos.txt. I am planning to remove the robots.txt blcok so their bots see the noindex meta tag.

indyank




msg:4494745
 1:40 pm on Sep 13, 2012 (gmt 0)

I must add that their crawl behavior is very creepy and suspectful since 2011. They probably are using some unannounced disguised bots that don't obey or play mischief with robots.txt often.

netmeg




msg:4494749
 2:05 pm on Sep 13, 2012 (gmt 0)

Honestly, I don't think it's an issue.

Sgt_Kickaxe




msg:4497429
 5:31 pm on Sep 19, 2012 (gmt 0)

After fixing the problem by implementing a 403 error before any redirect(sept 14th) the "not selected" number of pages has dropped by 98% inside my GWT "index status" report. In my particular case the edit comment redirects were all indexed and treated as urls without descriptions because they were blocked by robots.txt.

unblocking them in robots.txt might have also worked, i.e. removing the disallow /wp-admin/

I won't post my solution here because it involved core changes to wordpress and I haven't worked out an equivalent function to do the same just yet.

I strongly suggest you do the following google search - "site:example.com example.com" and check the supplemental index for robots.txt blocked pages.

1script




msg:4497477
 7:15 pm on Sep 19, 2012 (gmt 0)

I'd say it's probably better to remove redirects: you have a crawling budget and the redirect URLs will have to be counted against it, together with "good" URLs. Even though Google will (may?) realize the URLs are bad for indexing, they still need to crawl them (or attempt to crawl and get a 403). That just leaves less of the crawl budget for the stuff you actually want them to see.


P.S. Just re-read Sgt's last post: 403 *before* redirect would take care of that, this is how I'd also do it. Did you do JS redirect or meta refresh? My guess is that you are no longer able to administrate the blog using IE 'cause it [used to] show its own 403 error page instead of what you sent after a 403 header? Sorry, it's been years since I used IE for anything [important] ...

Sgt_Kickaxe




msg:4497583
 3:03 am on Sep 20, 2012 (gmt 0)

I don't use IE to administer the blog, I always preferred a non-search engine backed browser to do that for privacy reasons, not that it helped at all.

I used the .htaccess file along with some changes in a wordpress core file to do the trick because JS can be disabled and meta refresh can be a problem too. I am leaning towards Netmegs opinion that it probably has little ranking impact, Google knows and expects some things like this, but I do feel some relief in not seeing so many *not included* urls, and non-existant urls in general, in my GWT report.

tedster




msg:4497592
 4:01 am on Sep 20, 2012 (gmt 0)

Google knows and expects some things like this

Yes - especially for widely used content management systems like Wordpress.

If your Wordpress installation is highly customized, you may continue to need to do your own further customization. That said, there are also some excellent plug-ins available that can take care of many SEO issues like this. I've used them on several large scale international projects. The client and I were highly pleased with the results in each case.

netmeg




msg:4497869
 3:30 pm on Sep 20, 2012 (gmt 0)

Argh, I hate seeing people mess with core WordPress files. Are you going to have to keep making this change every time there's an update?

Seriously, dude. I have over fifty WP sites in my purview. It's really really not an issue.

indyank




msg:4497878
 3:45 pm on Sep 20, 2012 (gmt 0)

I do agree it is not an issue but id does give a sense of satisfaction when you don't see all those errors on your dashboard.

403 isn't really an error. It is just an html response.As I said before, all you had to do is to restrict access to wp-admin folder to your ip address. It doesn't need any alteration to the core wordpress files.

You just had to add this to .htaccess in your wp-admin folder.

Order deny,allow
Deny from all
Allow from <your ip address>

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved