| Block the main page of a directory but not its contents?
|
bresso

msg:3255094 | 12:32 am on Feb 17, 2007 (gmt 0) | I'm usually fairly aware of the use of the robots.txt file but I'm just stuck on this one. I want to block Site.com/en as this page features the same content as the homepage, but I do not want to block its contents (site.com/en/pagea.html , site.com/en/subdir/pagex.html , ...) Have you guys any idea how would I achieve that? I've seen somewhat similar situations and solutions, but I'm scared of screwing up and would like to request your help. Thank you!
|
webdoctor

msg:3256883 | 3:33 pm on Feb 19, 2007 (gmt 0) | Why not add the NOINDEX meta tag on the page itself, rather than trying to achieve this in robots.txt?
|
bresso

msg:3257333 | 11:38 pm on Feb 19, 2007 (gmt 0) | Because if I add a NOINDEX meta tag on the page itself, in addition to blocking the page at site.com/en from being indexed, it will also affect the page at site.com/ (as this is the same page). Any other ideas? Thanks.
|
webdoctor

msg:3257530 | 6:26 am on Feb 20, 2007 (gmt 0) | FWIW, if you're using a server-side scripting language (e.g. PHP or ASP) then you can detect the URI that has caused the script to be executed, and so find out if the page is being called as / or as /en and then output the additional meta tag based on that. In pseudo code: if %URI% = 'http://www.example.com/en' { add extra meta tag NOINDEX } else { don't add meta tag }
|
bresso

msg:3258034 | 7:01 pm on Feb 20, 2007 (gmt 0) | Thanks for the answer. I thought about it too but this solution will impact the server's performance as it's a fairly trafficked site. Is it not possible to do this with robots.txt?
|
bresso

msg:3272299 | 7:39 am on Mar 6, 2007 (gmt 0) | anyone, please?
|
|
|