Forum Moderators: phranque

Message Too Old, No Replies

How do I protect one sub directory

prebenting google from indexing and following a sub directory

         

mboydnv

10:27 pm on Dec 22, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I'm not that savvy with htaccess. Was wondering if anyone could spare a quick line of code.

Google has indexed my booking form which is a no-no. I had the meta tag on the template and they still did it. It's blocked in my robots.txt file and yet they still indexed it. It's a puzzle to me.

So on further digging I viewed my htaccess file in the sub dirctory. https://mydomain.com/secured-booking/.htaccess

and line one of code is:
1. Options All -Indexes

Reading the net i think it was supposed to be (because it's a sub directory) : Options -Indexes

More reading led me to consider replacing line 1 with:

Header set X-Robots-Tag "noindex, noarchive, nosnippet"


All i want to do is have all files in a sub-directory not be followed or indexed. I have 3 or f of these types of directories. /affiliate/ and /secured-booking/ etc.

What can i do once i've blocked it in a robot.txt file, put the meta tag up... i can't password protect these directories..

will (in the sub-directory htaccess)

Header set X-Robots-Tag "noindex, noarchive, nosnippet"


do the trick?

[edited by: Ocean10000 at 1:00 am (utc) on Dec 23, 2015]
[edit reason] unlinked [/edit]

not2easy

11:01 pm on Dec 22, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The code you have can help prevent indexing, but under some circumstances, it can be crawled anyway. If there is a link on your site or on another site that leads to a file in that directory it may still be indexed if the link is not a no-follow link.

lucy24

12:04 am on Dec 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The "Options Indexes" line has nothing whatsoever to do with search engine indexing. It's about auto-generated indexing of directories that don't contain an "index.html" file. (Or other named file specified via DirectoryIndex.) Normally you want the line to say -Indexes. What you do not want it to say is any combination of "blahblah" with "+blahblah" or "-blahblah" as it will lead to unintended consequences; see Apache docs for the proper scary language. Get rid of the word "All".

Look at the html of your page. Not your local code, the actual page that you get on the Internet. Verify that it says
<meta name = "robots" content = "noindex">
I suspect it doesn't, because I have never heard of google ignoring this line. Here, of course, it's irrelevant because roboting-out the directory means that search engines will never see the line. They will similarly never see any index-related header, because they will never request the page in the first place.

So you have to make a decision: allow search engines to crawl, enabling them to see any "noindex" directives, or don't let them crawl, meaning that the page URL can theoretically appear in search results, but its content won't.

Now, it is very rare for a roboted-out page to appear in search-engine results, especially near the top. So either there's an exact-text match on the page title-- which doesn't seem likely if it's something generic like "hotel booking"-- or this page has racked up some phenomenally good links. What, exactly, do you see in search results that include your page, and how do you get to these results?

mboydnv

12:39 am on Dec 23, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi n2,

I put rel no follow on all buy now button links which lead to the /secured-booking/directory. I'm puzzled why google indexed it.

<meta name="robots" content="noindex,nofollow,noodp,noydir" /> is on the pages in there.

So i'll try this in the htaccess. thanks so much,

Header set X-Robots-Tag "noindex, noarchive, nosnippet"

mboydnv

1:01 am on Dec 23, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Hi Lucy:

I see this in Google:

https://www.mydomain.com/secured-booking...
A description for this result is not available because of this site's robots.txt – learn more.


step 1 of my booking form is loaded with php. passing database variables.

The php files begins with php code:

<?
session_start(); // ADD-130214
$inc_dir = "../affiliate-tours/inc/";
include ($inc_dir."data.php");
include ($inc_dir."config.php");
include ($inc_dir."functions.php");
include('Date.class.php');
require('/home/mydomain/public_html/wp-blog-header.php');
/******************************************************************************
* connect to MySQL database


and then somewhere many lines after this... I begin what would be the head of the document

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="https://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head profile="https://gmpg.org/xfn/11">


which then leads to

<meta name="robots" content="noindex,nofollow,noodp,noydir" />

So i'm thinking the php code at the top may be preventing the meta from being read correctly?

So htaccess may be the way to go in this situation.

Robots.txt doc has this code

User-Agent: *
Disallow: /secured-booking/


so i really hope

Header set X-Robots-Tag "noindex, noarchive, nosnippet" 


is the answer.

I would like to drop one of these htaccess files in 3 other sub directories.

Thanks so much! Always appreciate the time you give. Your help is valuable. ty

lucy24

4:48 am on Dec 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see this in Google:

Yes, that's exactly what you're supposed to see for a roboted-out page. (Google's and Bing's texts are very similar if not identical.) But the key question is: what search did you perform that brought up this result in the first place? If the only way to bring it up is to force a search with terms that only you know will lead to the page, then there's really nothing to worry about because no real-life human will ever see it. (And so what if they do? Surely there's nothing in the page URL that would lead people to choose it as an entry page over all other pages on your site.)

Again: All the headers in the world will not help if the page is roboted-out. If the robot can't crawl the page, it won't see its content (including the "robots" meta) and it won't see the response headers (including "noindex").

mboydnv

6:12 pm on Dec 23, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Just added this one line of code. Thanks so much.
Header set X-Robots-Tag "noindex, noarchive, nosnippet"