homepage Welcome to WebmasterWorld Guest from 54.196.225.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / WordPress
Forum Library, Charter, Moderators: lorax & rogerd

WordPress Forum

    
Disable all external feeds
Google somehow got my entire site structure
Davidcjmad1




msg:4345967
 7:05 pm on Jul 31, 2011 (gmt 0)

Hi,

I am developing a site on a throw away subdomain with the intention of moving the WP CMS to main domain. Users must be logged into see the development site. I just did a site:sub.domain.com and there are 1000s of pages indexed. I do have

" function cwc_disable_feed() {
wp_die( __('No feed available,please visit our homepage!') );
}
add_action('do_feed', 'cwc_disable_feed', 1);
add_action('do_feed_rdf', 'cwc_disable_feed', 1);
add_action('do_feed_rss', 'cwc_disable_feed', 1);
add_action('do_feed_rss2', 'cwc_disable_feed', 1);
add_action('do_feed_atom', 'cwc_disable_feed', 1);
"

Added to functions.php as a result but i wonder if i missed anything ?

My plan now is to 301 the subdomain pages to the main domain pages keeping the URL structure intact and hope a dup penalty is not applied.

Anybody been through this , any advice or experience come to mind ?

Thanks

 

londrum




msg:4345978
 8:44 pm on Jul 31, 2011 (gmt 0)

all that function does is remove the RSS feeds, but it does nothing to the actual pages.

Davidcjmad1




msg:4345979
 8:51 pm on Jul 31, 2011 (gmt 0)

Thanks Londrum, i do understand. Essentially the site is behind a requirement to be logged in to view the site using :

"

<?php

get_currentuserinfo();

global $user_ID;

if ($user_ID == '')

{

header('Location: wp-login.php');

} ?>

"

in the header. Thus im assuming that something WP did fed google the data , ( its an annoying default behaviour it seems ) . I would rather have full control over how many server is sending data out to the outside world.. its " all " feed methods I am trying to kill.

tangor




msg:4345981
 8:54 pm on Jul 31, 2011 (gmt 0)

Gotta ask, does nobody do their dev on local machines these days? That would STOP GOOGLE or anyone else dead in their tracks.

lorax




msg:4346166
 12:42 pm on Aug 1, 2011 (gmt 0)

I typically develop on a sub-domain and use WP's privacy setting plus set the site behind a uname/pwd setup. That has worked well. No pages indexed until I'm ready to let the bots in.

At this point, you could do as you suggest, be sure robots.txt is setup to disallow, and put a uname/pwd security on the directory. You could use Google's webmaster tools to remove the pages on the sub-domain from the index couldn't you?

Davidcjmad1




msg:4346167
 12:57 pm on Aug 1, 2011 (gmt 0)

Thank you, it is because i am collaborating with 2 people online that its online in the first place. What i had done was blocked access to the site with the password ( but inside WP itself ) thinking this was sufficent ( so i dont understand how google got the urls ). Am i wrong to think that WP is like a sieve when it comes to feeds being enabled by default.

I have now set privacy settings to block search engines but to my mind they should have been blocked anyway ( ie seen redirect to the WP login page )

londrum




msg:4346176
 1:08 pm on Aug 1, 2011 (gmt 0)

cant you just put a robots.txt in the subdomain root, noindexing the whole lot while you test?
that will keep it out of the index.

but i dont think there is anyway to stop bots crawling it. even if they have to login to see stuff, there is nothing to stop them visiting the URL. google can pick up URLs from deep withing the site, beyond the login page, from people using chrome, their toolbar...

Davidcjmad1




msg:4346178
 1:20 pm on Aug 1, 2011 (gmt 0)

Yes Londrum , however given they have indexed "every page" , even orphan pages that were / are not in the site structure , that i know i never visited even in development , i still suspect that " they " got a " feed " from somewhere. Even if the bot arrived to a deep link it should never have seen the content because the header would redirect them to login. And yes i have the no index , no cache and robots dissallow " now " however its baffling me how goog got a " map " of every page including orphans.

lorax




msg:4346424
 11:33 pm on Aug 1, 2011 (gmt 0)

>> ( but inside WP itself ) thinking this was sufficent

Understood but did you apply the security to those pages? Each page is public unless you tell WP it is not.

Visibility - This determines how your post appears to the world. Public posts will be visible by all website visitors once published. Password Protected posts are published to all, but visitors must know the password to view the post content. Private posts are visible only to you (and to other editors or admins within your site)

[codex.wordpress.org...]

Davidcjmad1




msg:4347997
 10:58 pm on Aug 4, 2011 (gmt 0)

Thanks Lorax, i will take those suggestions on board.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / WordPress
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved