Welcome to WebmasterWorld Guest from 54.145.44.134

Forum Moderators: rogerd & travelin cat

Message Too Old, No Replies

Disable all external feeds

Google somehow got my entire site structure

     

Davidcjmad1

7:05 pm on Jul 31, 2011 (gmt 0)



Hi,

I am developing a site on a throw away subdomain with the intention of moving the WP CMS to main domain. Users must be logged into see the development site. I just did a site:sub.domain.com and there are 1000s of pages indexed. I do have

" function cwc_disable_feed() {
wp_die( __('No feed available,please visit our homepage!') );
}
add_action('do_feed', 'cwc_disable_feed', 1);
add_action('do_feed_rdf', 'cwc_disable_feed', 1);
add_action('do_feed_rss', 'cwc_disable_feed', 1);
add_action('do_feed_rss2', 'cwc_disable_feed', 1);
add_action('do_feed_atom', 'cwc_disable_feed', 1);
"

Added to functions.php as a result but i wonder if i missed anything ?

My plan now is to 301 the subdomain pages to the main domain pages keeping the URL structure intact and hope a dup penalty is not applied.

Anybody been through this , any advice or experience come to mind ?

Thanks

londrum

8:44 pm on Jul 31, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



all that function does is remove the RSS feeds, but it does nothing to the actual pages.

Davidcjmad1

8:51 pm on Jul 31, 2011 (gmt 0)



Thanks Londrum, i do understand. Essentially the site is behind a requirement to be logged in to view the site using :

"

<?php

get_currentuserinfo();

global $user_ID;

if ($user_ID == '')

{

header('Location: wp-login.php');

} ?>

"

in the header. Thus im assuming that something WP did fed google the data , ( its an annoying default behaviour it seems ) . I would rather have full control over how many server is sending data out to the outside world.. its " all " feed methods I am trying to kill.

tangor

8:54 pm on Jul 31, 2011 (gmt 0)

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



Gotta ask, does nobody do their dev on local machines these days? That would STOP GOOGLE or anyone else dead in their tracks.

lorax

12:42 pm on Aug 1, 2011 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I typically develop on a sub-domain and use WP's privacy setting plus set the site behind a uname/pwd setup. That has worked well. No pages indexed until I'm ready to let the bots in.

At this point, you could do as you suggest, be sure robots.txt is setup to disallow, and put a uname/pwd security on the directory. You could use Google's webmaster tools to remove the pages on the sub-domain from the index couldn't you?

Davidcjmad1

12:57 pm on Aug 1, 2011 (gmt 0)



Thank you, it is because i am collaborating with 2 people online that its online in the first place. What i had done was blocked access to the site with the password ( but inside WP itself ) thinking this was sufficent ( so i dont understand how google got the urls ). Am i wrong to think that WP is like a sieve when it comes to feeds being enabled by default.

I have now set privacy settings to block search engines but to my mind they should have been blocked anyway ( ie seen redirect to the WP login page )

londrum

1:08 pm on Aug 1, 2011 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



cant you just put a robots.txt in the subdomain root, noindexing the whole lot while you test?
that will keep it out of the index.

but i dont think there is anyway to stop bots crawling it. even if they have to login to see stuff, there is nothing to stop them visiting the URL. google can pick up URLs from deep withing the site, beyond the login page, from people using chrome, their toolbar...

Davidcjmad1

1:20 pm on Aug 1, 2011 (gmt 0)



Yes Londrum , however given they have indexed "every page" , even orphan pages that were / are not in the site structure , that i know i never visited even in development , i still suspect that " they " got a " feed " from somewhere. Even if the bot arrived to a deep link it should never have seen the content because the header would redirect them to login. And yes i have the no index , no cache and robots dissallow " now " however its baffling me how goog got a " map " of every page including orphans.

lorax

11:33 pm on Aug 1, 2011 (gmt 0)

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>> ( but inside WP itself ) thinking this was sufficent

Understood but did you apply the security to those pages? Each page is public unless you tell WP it is not.

Visibility - This determines how your post appears to the world. Public posts will be visible by all website visitors once published. Password Protected posts are published to all, but visitors must know the password to view the post content. Private posts are visible only to you (and to other editors or admins within your site)

[codex.wordpress.org...]

Davidcjmad1

10:58 pm on Aug 4, 2011 (gmt 0)



Thanks Lorax, i will take those suggestions on board.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month