Welcome to WebmasterWorld Guest from 54.147.44.13

Message Too Old, No Replies

Should I block Google bot from wordpress wp-admin, wp-include .

     
8:39 pm on Apr 23, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts:78
votes: 0


Hi, I read from digital inspirition about blocking Google bot from crawling wp-admin, wp-include and wp-content. Do you agree we should do that?

And it this the right code I use on Google webmaster tools robot.txt?

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp/wp-

Thanks,
5:39 am on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


That seems OK - and possibly too minimal. Here's a sample robots.txt file from wordpress.org

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

[codex.wordpress.org...]
6:32 am on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


Greatly thanks, Tedster. So all these folder does not harm search engine? I have few more folder I created too need to add into this file.
7:07 am on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


Hi Tedster,
Here is my Robots.txt. Do you see anything wrong ?

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011
Disallow: /wp-includes
Disallow: /health-topics
Disallow: /wp-includes-old
Disallow:/wp-includes-4-11-2011
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow:/wp-content-original
Disallow:/wp-content-original/cache
Disallow:/wp-content-original/themes
Disallow:/wp-content-original/plugins
Disallow: /wp-content/themes
Disallow: /trackback
Disallow:/therapists
Disallow:/rootfile_bk
Disallow:/old
Disallow:/plesk-stat
Disallow:/doctors
Disallow:/bk_wp2.91
Disallow: /feed
Disallow: /comments
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

Thanks in Advance,
7:14 am on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


It's all going to depend on what you are doing with Wordpress, specifically. I can't say if you really want to block a directory like /therapists or /doctors, for instance.
8:08 am on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Attention to detail.

Add a space after the colon where it is missing.

The final trailing * is not required (one entry).
8:11 am on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011


The first entry blocks anything beginning with
/wp-admin
and therefore the next two entries are redundant.
2:40 pm on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


Help me out, I am new to this backend thing.
So block /wp-admin will also block anything after /wp-admin such as /wp-admin-4-21-2011? but they are diff directory?

Please be patient with me as I am learning this.

[edited by: My_Media at 2:44 pm (utc) on Apr 24, 2011]

2:43 pm on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


Here is the latest: All those directory I do not want Google to crawl since some of it is a test.

Here is the revision:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011
Disallow: /wp-includes
Disallow: /health-topics
Disallow: /wp-includes-old
Disallow: /wp-includes-4-11-2011
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content-original
Disallow: /wp-content-original/cache
Disallow: /wp-content-original/themes
Disallow: /wp-content-original/plugins
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /therapists
Disallow: /rootfile_bk
Disallow: /old
Disallow: /plesk-stat
Disallow: /doctors
Disallow: /bk_wp2.91
Disallow: /feed
Disallow: /comments
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Allow: /wp-content/uploads

Please advice and thanks in advance.
2:49 pm on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


Do this robots command change and hinder my current permalink performances?
3:09 pm on Apr 24, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts: 12166
votes: 51


Why would you Disallow stuff that is usually behind a login? You've basically provided a roadmap for prying eyes for where all your password protected directories are. That's a hacker's delight right there. Careful with robots.txt files, they may not be the best option in this instance. Anything behind a login should not be in the robots.txt file, the bot is going to get a 403.
3:22 pm on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


BUt I am sure that everyone who use wordpress will know /wp-admin. Can you specify what I should do?
3:32 pm on Apr 24, 2011 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts: 12166
votes: 51


Can you specify what I should do?


Not really, I've never really used WordPress. I try to keep my robots.txt files at a bare minimum. Everything else is done at the server and/or document level with noindex or noindex, nofollow.

Items such as those you list above are password protected so there is no need for a robots.txt entry. All of our admin documents have noindex by default. I do that just in case there is ever a mishap and something is made available for indexing and it shouldn't be. I like to play it safe.

I'd rather Google NOT show thousands of URI Only entries due to robots.txt. I don't want people to be able to perform a site: search and find ALL of my Disallowed stuff. They don't need to know that and I don't need to provide Google with any hints as to where it is all at.
4:08 pm on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


So block /wp-admin will also block anything after /wp-admin such as /wp-admin-4-21-2011? but they are diff directory?

Forget directories. Think URLs.

Disallow: /wp-admin
will block any and every URL that begins with "
/ w p - a d m i n
".

However, as pageone notes above, you don't need to list stuff that will automatically return the HTTP 401 status code to the bots.
4:37 pm on Apr 24, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0


I have few directory on the home directory that I use to test things so should I block those directories?
/doctors --test dir
/wp-content-4-11-2011 --backup dir
..etc

So should I Disallow them cause Google will crawl those too right?
5:09 pm on Apr 24, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


If they need a password to get in, they are already blocked.

In fact, for test folders, I would use .htpasswd to keep bots and humans alike out.
9:00 pm on Apr 24, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Jan 9, 2007
posts:254
votes: 0


If blog is hosted at directory level, would it be a right string to disallow bots? e.g example.com/blog and the robots.txt entry goes like

/blog/*/trackback

and robots.txt hosted at root level of main domain.

Please let me know if its a right entry
11:51 pm on Apr 25, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 27, 2003
posts: 1642
votes: 0


Bear in mind that if you have images in your blog posts that you want indexed, but you uploaded them with the built in wordpress button then blocking wp-content will block them too - perhaps let imagebot into wp-content/themes ?
8:46 pm on Apr 27, 2011 (gmt 0)

Junior Member

5+ Year Member

joined:Nov 8, 2010
posts: 78
votes: 0



My Robots.txt has Disallow: */trackback
I received from Google Webmaster Tools shows Restricted by robots.txt ‎(455)‎.
One of them is this:
http://www.example.com/collarbone-pain-causes-and-treatment.html/trackback

Question:
This link back to my page http://www.example.com/collarbone-pain-causes-and-treatment.html. So does this mean that the robot also blocked this URL ?


Should I just use. Disallow: /trackback

Greatly appreciated your help.
Thanks,

[edited by: tedster at 10:23 pm (utc) on Apr 27, 2011]
[edit reason] switch to example.com [/edit]

10:22 pm on Apr 27, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 26, 2000
posts:37301
votes: 0


The rule you are using in robots.txt does not stop spidering for http://www.example.com/collarbone-pain-causes-and-treatment.html