Welcome to WebmasterWorld Guest from 18.204.227.250

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Indexing Wordpress CMS Admin Folder URLs

     
8:12 am on Jul 4, 2018 (gmt 0)

New User

joined:July 5, 2017
posts: 30
votes: 1


For some reason, Google has been aggressively crawling and now indexing sets of CMS admin URL

It crawled and indexed over 50 /wp-includes URLs - I have since blocked these on Robots.txt

Its now crawled and indexed a similar number of URLS from the /app folder e.g. /app/mu-plugins/advanced-custom-fields/images/add-ons/ into the primary index

The problem is I can't block this folder on robots.txt as it contains the CSS and JS of the actual site - which I understand Google needs to render the site and is now best practice to give Google access to.

Why is Google suddenly finding and indexing these URLs?
What can I do to stop Google from crawling and indexing them?
8:53 am on July 4, 2018 (gmt 0)

New User

joined:July 5, 2017
posts: 30
votes: 1


Maybe one solution is to set no index for these file / URLs from the x-robots tag in the header via http access:

<Files ~ "\app $">
Header append X-Robots-Tag "noindex"
</Files>

Does this seem right?
11:28 am on July 4, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


@Rysk100 - why not simply noindex the folders directly in the robots.txt.
11:32 am on July 4, 2018 (gmt 0)

New User

joined:July 5, 2017
posts: 30
votes: 1


Unless i'm very mistaken /robots.txt stops crawling not indexing. Unlike/wp-admin and /wp-includes the files under /app contain the site's CSS, images and J.S which Google needs to be able to crawl
11:37 am on July 4, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


My point is, you said you were going to append a header tag to noindex the app folder, and I suggested to just write it directly in the robots.txt.

In robots.txt just disallow the URLs you don't want indexed. Then remove them from the index in Google Search Console with the removal tool. That way they won't be crawled in the future & reindexed.

[fix typo]

[edited by: keyplyr at 11:55 am (utc) on Jul 4, 2018]

11:44 am on July 4, 2018 (gmt 0)

Preferred Member from CA 

Top Contributors Of The Month

joined:Feb 7, 2017
posts: 575
votes: 59


I am also interested. I've never had Google or any other search engine index my WP admins. Did you set permissions on your folders to not allow viewing? [codex.wordpress.org...]
1:28 pm on July 4, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4504
votes: 347


You can disallow folders with robots.txt and allow specific file types within those folders.
Disallow: /wp-includes/
Allow: /wp-includes/js/
Allow: /wp-includes/css/
Note that 'Allow' follows 'Disallow'
Allow: /*.css
Allow: /*.js
covers all .js and .css in disallowed folders - those would be after all disallows. Also note that this applies to Google's bots and 'some' others - not all; also note that bad bots don't even bother to read robots.txt.

If you want to know whether it works, use the robots.txt Tester in GSC (not the 'new' version).

6:28 pm on July 4, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3660
votes: 373


Fake googlebots might try to crawl those files looking for weaknesses that a hacker could exploit
7:43 pm on July 11, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12388
votes: 409


Mod's note: This discussion continued by OP under new topic...

X-Robots Noindex or 403 Forbidden?
https://www.webmasterworld.com/google/4910541.htm [webmasterworld.com]

10:48 am on July 16, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11842
votes: 242


Google has been aggressively crawling and now indexing sets of CMS admin URL

Unlike/wp-admin and /wp-includes...

requests for /wp-includes/ paths should get a 403 (Forbidden) status code.
see this from the Codex:
https://codex.wordpress.org/Hardening_WordPress#WP-Includes

requests for /wp-admin/ paths should get a 401 status code which is typically a challenge for HTTP Basic Athentication.
see this from the Codex:
https://codex.wordpress.org/Hardening_WordPress#WP-Admin

both the 401 and 403 status code responses will prevent google from indexing the requested url.