|How to get just Homepage to be indexed, and not any other pages?|
| 7:06 pm on May 7, 2009 (gmt 0)|
We have a website which provides private label store fronts for small stores in the US. Basically they are given a domain of www.Oursite.com/Store.aspx=450004 , where 45004 is the private label store front accesible only to those who have that code.
The stores are dynamically generated for the specific purpose that we don't want Google/Yahoo/etc to index all of our client's sub-stores - the customer is supposed to go to OurSite.com and enter the unique 6 digit code to access the store.
Unfortunately, one of our customers has placed a direct link to his store on his own home page, which lets the user bypass the store login page and go directly to the store: www.Oursite.com/Store.aspx=450004
That's fine, but what we didn't realize was that Google/Yahoo crawlers are now able to index the store. So, when a customer from store X Googles Oursite.com, they will see this particular client's store (say store Y). That is not good..
We have just added a noindex and noarchive meta tag onto all dynamic substore pages, hoping this will reverse the problem and that this store will no longer appear in Google/Yahoo results. Basically we only want Oursite.com to be indexed, not any other pages.
Is there anything else we can do to ensure this does not happen again? The only way for our site to be successful is for Google NOT to crawl it.
I have read about a relative nofollow link and am wondering if that could be the solution -e.g., request that if our customers use the direct store link on their own home page, that they must wrap the relative no follow around it. But haven't had direct experience with that. Any suggestions would be super appreciated.
Thanks and love these forums!
| 3:28 pm on May 8, 2009 (gmt 0)|
How about using the robots.txt to block everything apart from what you want? Also to speed up the de-indexing of the pages that Googlebot crawled you should sign up with 'Google Webmaster Tools' and remove those pages manually.
| 3:30 pm on May 9, 2009 (gmt 0)|
@le_gber - thanks for the tip. We first tried the nofollow and noarchive tags, but these did not appear to prevent Google or Yahoo from indexing no pages, nor did it make Google/Yahoo un-archive the pages.
So we had to implement a robots.txt to block everything. I submitted a removal request as well through Google Webmaster Tools plus Yahoo site explorer.
However, I am wondering, is there a way in the Robots.txt file to request that the home page itself (e.g., oursite.com) be indexed, but no other pages? I haven't been able to find how to do this - it seems as though the only option is to remove all pages, but I would think there's a way to let the home page only be indexed.
Any suggestions would be appreciated. Hopefully this forum will be useful to others as well.
| 10:50 am on May 11, 2009 (gmt 0)|
Blocking pages via robots file and removing pages manually from webmasters tools aren't the total solution. If crawlers find link to your inner pages, they will again index them.
| 11:45 am on May 11, 2009 (gmt 0)|
The disallow rules can be crafted to block everything except the root. You need one line for each disallow rule.
At worst you would use:
to block everything, because blocking
/a blocks everything that begins with
/a of course.
Of course, be aware that such URLs can still appear as URL-only entries in the SERPs.
| 12:53 pm on May 11, 2009 (gmt 0)|
I've used the following syntax in the past (for bots with support for advanced syntax and allow) this allows only the homepage to be indexed:
| 1:07 pm on May 11, 2009 (gmt 0)|
I'd say that syntax is risky, as for any bot not understanding "allow" it blocks everything: OK I see you specify it only for Google.
I'd guess the OP does also want to consider Yahoo, Live, Ask, SearchMe, etc...