Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

How do I block the homepage ONLY?
Perhaps an unusual request

 3:09 am on Sep 21, 2012 (gmt 0)

Hi all, messing up one's robots.txt can have really bad repercussions, so I would love to hear from people with more experience before completely destroying the site's SE traffic.

I have this rather unusual situation where I need to block the homepage, and only the homepage, of a site from indexing. I am moving a site penalized by Google (I'm going under assumption it's for bad links) and I don't want the bad links to follow (as in passing PR). Almost 100% of the bad links are to the homepage. There's a large number of good links to other pages, which I would like to salvage if at all possible.

So, is it even within the scope of the robots.txt protocol to block a page and not all the pages down the tree from it? Unless some trick exists, it appears to me that anything I do to the homepage in robots.txt will then be inherited by the rest of the site, so it will all vanish.

Can anyone comment on dealing with the homepage in robots.txt?




 3:16 am on Sep 21, 2012 (gmt 0)

I forgot to add: I am looking at messing with robots.txt instead of adding a meta robots noindex tag to the homepage because

#1 I would prefer to redirect the old (bad) homepage to the new via a 301

#2 Google lists only two remedies to bad links' influence on a page:

(from [support.google.com...] )

* Adding a rel="nofollow" attribute to the <a> tag
* Redirecting the links to an intermediate page that is blocked from search engines with a robots.txt file

the first is not possible - I don't control the links, long story (all in my earlier posts here at WebmasterWorld)

the second - I would like to make the bad homepage that intermediate page for the new homepage (if it makes sense)

So, it sounds like I would have to disallow it in robots.txt if it's even possible.


 6:07 am on Sep 21, 2012 (gmt 0)

robots.txt will not prevent the page from being indexed. It will only prevent it from being crawled-- which means that if you've made any changes for the better, g can't see them. What you need to do is

#1 set up the redirect
#2 go into gwt and delete the page from the current cache and index. Next time g### tries to crawl the page, it will meet the 301 and it will carry on from there.

That's assuming for the sake of discussion that the redirect approach is the right way to go. Don't look at me; I don't speak SEO.

Putting a meta noindex on the old page won't do any harm-- but won't do any good either, if the redirect is starting right now. And "nofollow" is obviously pointless on your front page-- it would only prevent google from following links away from the page into the rest of your site, which is exactly what you don't want!

bing wmt has a function for "I disown this link categorically and decline to have anything to do with it" but I don't think g### does yet. Unless I blinked and missed something.

You can put up referer blocks in htaccess if there are humans that you don't want physically coming to your site from Those Other Sites. But I kinda doubt this is your problem ;) Otherwise it's a waste of time, because it has no effect on robots. Google doesn't follow links; it makes a shopping list and comes back later. And even if it changed its crawling technique and did meet a referer block, it wouldn't know that that's why it is getting blocked. (A 403 is like calling someone on the phone and getting no answer. You have no way of knowing whether they're really not home, or they've turned off the ringer, or the rats have been chewing on the phone cord, or maybe they glanced at Caller ID and said I don't feel like talking to him right now.)


 9:07 am on Sep 21, 2012 (gmt 0)

i'm going to hate myself for saying this:
you could redirect all requests for your home page to a new url - say home.html or default.aspx or what have you - and exclude that url from crawling in robots.txt.
it might even be confusing enough if you used a 302 status code.
that meets the "letter of the law" so to speak.

it certainly doesn't address the "completely destroying the site's SE traffic" side of things.
i'm pretty sure no matter if, where or how you redirect and then block your home page from crawling that turns your home page into a link sink.
i can't see anything good resulting.

are you moving to a new domain or subdomain?


 4:32 pm on Sep 21, 2012 (gmt 0)

@phranque: Thanks for your suggestion! Yes, if I do in fact trust Google on what they advise for bad links handling (and I do hear lucy24's skepticism about that), it looks like I would have to create a page on the OLD site (say, default.html), redirect old homepage to it but have default.html robot-disallowed.

This feels very much like threading on a thin ice, for sure, but, again, if Google can be trusted on what they write on their website, that should stop flow of PR and remedy the negative effect of bad links while allowing actual users follow links to the homepage and eventually land on the new homepage.

As far as the redirected homepage being a link sink, I don't see that - all OTHER pages will redirect to the new locations, so whatever internal pages were linking to the old homepage, they are now not accessible and the pages that they redirect to are now pointing to the new homepage. So the old homepage appears to be standing on its own in this kind of setup. I hope my logic is not faulty, please advise if you see holes in it.

@lucy24: I actually don't really care if there are humans coming in on bad links. There will not be many anyhow: the bad links are in the footer of WP themes I've sponsored and that's not the most clicked at location on the page. I am mostly concerned about people coming from other, good sites, on a ton of good links to the homepage. In other words, the old homepage is the destination for good links as well as bad and, while I realize that I have to forgo the positive effect of good links on the homepage, I would like to preserve the positive effect of other good links that point to deeper pages within my site.

The desire to avoid creating an intermediate page (default.html, robot-disallowed) was just to avoid an extra redirect. But is still seems like a little safer approach in this slippery situation than to mess with the homepage itself.

Thanks for your input, guys! Again, if anyone sees any holes in this logic, please chime in here, I would like to consider all angles before pulling the plug.


 6:22 pm on Sep 21, 2012 (gmt 0)

all OTHER pages will redirect to the new locations

a new domain?


 7:03 pm on Sep 21, 2012 (gmt 0)

a new domain?
Yes, that's the plan. Pages other than the homepage should 301 redirect straight to their new domain versions.

 6:22 am on Sep 28, 2012 (gmt 0)

No. You can't block homepage. Because homepage is main page of your website. If you block via robot.txt in search engine, than you can create wrong impression of your search engine.

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
