homepage Welcome to WebmasterWorld Guest from 54.161.175.231
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Should I block Google bot from wordpress wp-admin, wp-include .
My_Media




msg:4303084
 8:39 pm on Apr 23, 2011 (gmt 0)

Hi, I read from digital inspirition about blocking Google bot from crawling wp-admin, wp-include and wp-content. Do you agree we should do that?

And it this the right code I use on Google webmaster tools robot.txt?

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /wp/wp-

Thanks,

 

tedster




msg:4303203
 5:39 am on Apr 24, 2011 (gmt 0)

That seems OK - and possibly too minimal. Here's a sample robots.txt file from wordpress.org

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: /category/*/*
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

[codex.wordpress.org...]

My_Media




msg:4303213
 6:32 am on Apr 24, 2011 (gmt 0)

Greatly thanks, Tedster. So all these folder does not harm search engine? I have few more folder I created too need to add into this file.

My_Media




msg:4303219
 7:07 am on Apr 24, 2011 (gmt 0)

Hi Tedster,
Here is my Robots.txt. Do you see anything wrong ?

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011
Disallow: /wp-includes
Disallow: /health-topics
Disallow: /wp-includes-old
Disallow:/wp-includes-4-11-2011
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow:/wp-content-original
Disallow:/wp-content-original/cache
Disallow:/wp-content-original/themes
Disallow:/wp-content-original/plugins
Disallow: /wp-content/themes
Disallow: /trackback
Disallow:/therapists
Disallow:/rootfile_bk
Disallow:/old
Disallow:/plesk-stat
Disallow:/doctors
Disallow:/bk_wp2.91
Disallow: /feed
Disallow: /comments
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Allow: /wp-content/uploads

Thanks in Advance,

tedster




msg:4303220
 7:14 am on Apr 24, 2011 (gmt 0)

It's all going to depend on what you are doing with Wordpress, specifically. I can't say if you really want to block a directory like /therapists or /doctors, for instance.

g1smd




msg:4303229
 8:08 am on Apr 24, 2011 (gmt 0)

Attention to detail.

Add a space after the colon where it is missing.

The final trailing * is not required (one entry).

g1smd




msg:4303230
 8:11 am on Apr 24, 2011 (gmt 0)

Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011


The first entry blocks anything beginning with
/wp-admin and therefore the next two entries are redundant.
My_Media




msg:4303310
 2:40 pm on Apr 24, 2011 (gmt 0)

Help me out, I am new to this backend thing.
So block /wp-admin will also block anything after /wp-admin such as /wp-admin-4-21-2011? but they are diff directory?

Please be patient with me as I am learning this.

[edited by: My_Media at 2:44 pm (utc) on Apr 24, 2011]

My_Media




msg:4303311
 2:43 pm on Apr 24, 2011 (gmt 0)

Here is the latest: All those directory I do not want Google to crawl since some of it is a test.

Here is the revision:
User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-admin-old
Disallow: /wp-admin-4-11-2011
Disallow: /wp-includes
Disallow: /health-topics
Disallow: /wp-includes-old
Disallow: /wp-includes-4-11-2011
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content-original
Disallow: /wp-content-original/cache
Disallow: /wp-content-original/themes
Disallow: /wp-content-original/plugins
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /therapists
Disallow: /rootfile_bk
Disallow: /old
Disallow: /plesk-stat
Disallow: /doctors
Disallow: /bk_wp2.91
Disallow: /feed
Disallow: /comments
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Allow: /wp-content/uploads

Please advice and thanks in advance.

My_Media




msg:4303312
 2:49 pm on Apr 24, 2011 (gmt 0)

Do this robots command change and hinder my current permalink performances?

pageoneresults




msg:4303319
 3:09 pm on Apr 24, 2011 (gmt 0)

Why would you Disallow stuff that is usually behind a login? You've basically provided a roadmap for prying eyes for where all your password protected directories are. That's a hacker's delight right there. Careful with robots.txt files, they may not be the best option in this instance. Anything behind a login should not be in the robots.txt file, the bot is going to get a 403.

My_Media




msg:4303323
 3:22 pm on Apr 24, 2011 (gmt 0)

BUt I am sure that everyone who use wordpress will know /wp-admin. Can you specify what I should do?

pageoneresults




msg:4303328
 3:32 pm on Apr 24, 2011 (gmt 0)

Can you specify what I should do?


Not really, I've never really used WordPress. I try to keep my robots.txt files at a bare minimum. Everything else is done at the server and/or document level with noindex or noindex, nofollow.

Items such as those you list above are password protected so there is no need for a robots.txt entry. All of our admin documents have noindex by default. I do that just in case there is ever a mishap and something is made available for indexing and it shouldn't be. I like to play it safe.

I'd rather Google NOT show thousands of URI Only entries due to robots.txt. I don't want people to be able to perform a site: search and find ALL of my Disallowed stuff. They don't need to know that and I don't need to provide Google with any hints as to where it is all at.

g1smd




msg:4303336
 4:08 pm on Apr 24, 2011 (gmt 0)

So block /wp-admin will also block anything after /wp-admin such as /wp-admin-4-21-2011? but they are diff directory?

Forget directories. Think URLs.

Disallow: /wp-admin will block any and every URL that begins with " / w p - a d m i n ".

However, as pageone notes above, you don't need to list stuff that will automatically return the HTTP 401 status code to the bots.

My_Media




msg:4303346
 4:37 pm on Apr 24, 2011 (gmt 0)

I have few directory on the home directory that I use to test things so should I block those directories?
/doctors --test dir
/wp-content-4-11-2011 --backup dir
..etc

So should I Disallow them cause Google will crawl those too right?

g1smd




msg:4303354
 5:09 pm on Apr 24, 2011 (gmt 0)

If they need a password to get in, they are already blocked.

In fact, for test folders, I would use .htpasswd to keep bots and humans alike out.

zehrila




msg:4303425
 9:00 pm on Apr 24, 2011 (gmt 0)

If blog is hosted at directory level, would it be a right string to disallow bots? e.g example.com/blog and the robots.txt entry goes like

/blog/*/trackback

and robots.txt hosted at root level of main domain.

Please let me know if its a right entry

leadegroot




msg:4303890
 11:51 pm on Apr 25, 2011 (gmt 0)

Bear in mind that if you have images in your blog posts that you want indexed, but you uploaded them with the built in wordpress button then blocking wp-content will block them too - perhaps let imagebot into wp-content/themes ?

My_Media




msg:4305028
 8:46 pm on Apr 27, 2011 (gmt 0)


My Robots.txt has Disallow: */trackback
I received from Google Webmaster Tools shows Restricted by robots.txt ‎(455)‎.
One of them is this:
http://www.example.com/collarbone-pain-causes-and-treatment.html/trackback

Question:
This link back to my page http://www.example.com/collarbone-pain-causes-and-treatment.html. So does this mean that the robot also blocked this URL ?


Should I just use. Disallow: /trackback

Greatly appreciated your help.
Thanks,

[edited by: tedster at 10:23 pm (utc) on Apr 27, 2011]
[edit reason] switch to example.com [/edit]

tedster




msg:4305058
 10:22 pm on Apr 27, 2011 (gmt 0)

The rule you are using in robots.txt does not stop spidering for http://www.example.com/collarbone-pain-causes-and-treatment.html

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved