Forum Moderators: goodroi
The stores are dynamically generated for the specific purpose that we don't want Google/Yahoo/etc to index all of our client's sub-stores - the customer is supposed to go to OurSite.com and enter the unique 6 digit code to access the store.
Unfortunately, one of our customers has placed a direct link to his store on his own home page, which lets the user bypass the store login page and go directly to the store: www.Oursite.com/Store.aspx=450004
That's fine, but what we didn't realize was that Google/Yahoo crawlers are now able to index the store. So, when a customer from store X Googles Oursite.com, they will see this particular client's store (say store Y). That is not good..
We have just added a noindex and noarchive meta tag onto all dynamic substore pages, hoping this will reverse the problem and that this store will no longer appear in Google/Yahoo results. Basically we only want Oursite.com to be indexed, not any other pages.
Is there anything else we can do to ensure this does not happen again? The only way for our site to be successful is for Google NOT to crawl it.
I have read about a relative nofollow link and am wondering if that could be the solution -e.g., request that if our customers use the direct store link on their own home page, that they must wrap the relative no follow around it. But haven't had direct experience with that. Any suggestions would be super appreciated.
Thanks and love these forums!
-Daniel
So we had to implement a robots.txt to block everything. I submitted a removal request as well through Google Webmaster Tools plus Yahoo site explorer.
However, I am wondering, is there a way in the Robots.txt file to request that the home page itself (e.g., oursite.com) be indexed, but no other pages? I haven't been able to find how to do this - it seems as though the only option is to remove all pages, but I would think there's a way to let the home page only be indexed.
Any suggestions would be appreciated. Hopefully this forum will be useful to others as well.
Thank you,
Daniel
At worst you would use:
User-agent: *
Disallow: /a
Disallow: /b
Disallow: /c
Disallow: /d
Disallow: /e
Disallow: /f
...
...
Disallow: /z to block everything, because blocking
/a blocks everything that begins with /a of course. Of course, be aware that such URLs can still appear as URL-only entries in the SERPs.
User-agent: googlebot
Allow: /$
Disallow: /