homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to get just Homepage to be indexed, and not any other pages?
noindex, noarchive
dgc223




msg:3908866
 7:06 pm on May 7, 2009 (gmt 0)

Hello,
We have a website which provides private label store fronts for small stores in the US. Basically they are given a domain of www.Oursite.com/Store.aspx=450004 , where 45004 is the private label store front accesible only to those who have that code.

The stores are dynamically generated for the specific purpose that we don't want Google/Yahoo/etc to index all of our client's sub-stores - the customer is supposed to go to OurSite.com and enter the unique 6 digit code to access the store.

Unfortunately, one of our customers has placed a direct link to his store on his own home page, which lets the user bypass the store login page and go directly to the store: www.Oursite.com/Store.aspx=450004

That's fine, but what we didn't realize was that Google/Yahoo crawlers are now able to index the store. So, when a customer from store X Googles Oursite.com, they will see this particular client's store (say store Y). That is not good..

We have just added a noindex and noarchive meta tag onto all dynamic substore pages, hoping this will reverse the problem and that this store will no longer appear in Google/Yahoo results. Basically we only want Oursite.com to be indexed, not any other pages.

Is there anything else we can do to ensure this does not happen again? The only way for our site to be successful is for Google NOT to crawl it.

I have read about a relative nofollow link and am wondering if that could be the solution -e.g., request that if our customers use the direct store link on their own home page, that they must wrap the relative no follow around it. But haven't had direct experience with that. Any suggestions would be super appreciated.

Thanks and love these forums!
-Daniel

 

le_gber




msg:3909526
 3:28 pm on May 8, 2009 (gmt 0)

Hi Daniel,

How about using the robots.txt to block everything apart from what you want? Also to speed up the de-indexing of the pages that Googlebot crawled you should sign up with 'Google Webmaster Tools' and remove those pages manually.

dgc223




msg:3910116
 3:30 pm on May 9, 2009 (gmt 0)

@le_gber - thanks for the tip. We first tried the nofollow and noarchive tags, but these did not appear to prevent Google or Yahoo from indexing no pages, nor did it make Google/Yahoo un-archive the pages.

So we had to implement a robots.txt to block everything. I submitted a removal request as well through Google Webmaster Tools plus Yahoo site explorer.

However, I am wondering, is there a way in the Robots.txt file to request that the home page itself (e.g., oursite.com) be indexed, but no other pages? I haven't been able to find how to do this - it seems as though the only option is to remove all pages, but I would think there's a way to let the home page only be indexed.

Any suggestions would be appreciated. Hopefully this forum will be useful to others as well.

Thank you,
Daniel

vicky




msg:3910912
 10:50 am on May 11, 2009 (gmt 0)

Hi DGC223,

Blocking pages via robots file and removing pages manually from webmasters tools aren't the total solution. If crawlers find link to your inner pages, they will again index them.

Thanks
Shailendra

g1smd




msg:3910925
 11:45 am on May 11, 2009 (gmt 0)

The disallow rules can be crafted to block everything except the root. You need one line for each disallow rule.

At worst you would use:

User-agent: *
Disallow: /a
Disallow: /b
Disallow: /c
Disallow: /d
Disallow: /e
Disallow: /f
...
...
Disallow: /z

to block everything, because blocking /a blocks everything that begins with /a of course.

Of course, be aware that such URLs can still appear as URL-only entries in the SERPs.

Receptional Andy




msg:3910943
 12:53 pm on May 11, 2009 (gmt 0)

I've used the following syntax in the past (for bots with support for advanced syntax and allow) this allows only the homepage to be indexed:


User-agent: googlebot
Allow: /$
Disallow: /

g1smd




msg:3910948
 1:07 pm on May 11, 2009 (gmt 0)

I'd say that syntax is risky, as for any bot not understanding "allow" it blocks everything: OK I see you specify it only for Google.

I'd guess the OP does also want to consider Yahoo, Live, Ask, SearchMe, etc...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved