Forum Moderators: goodroi

Message Too Old, No Replies

robots.txt and Google on intranet

robots.txt not working

         

John488

5:55 pm on Jun 7, 2005 (gmt 0)

10+ Year Member



We are using the Google gsa on our intranet. I am attempting to use a robots.txt (currently as a test) to disallow crawling of one folder.
The site is:
http://example.com:88/
I have 2 heiarchies of folders in my test
/JGT01/Level 1/Level 2/Level 3/Level 4A/
/JGT01/Level 1/Level 2/Level 3/Level 4B/
My robots.txt is located at the site level and contains the following:
User-agent: *
Disallow: /JGT01/Level 1/Level 2/Level 3/Level 4B/

My intent is to disallow only the "Level 4B" folder. The gsa appears to be bypassing both the "Level 4A" and "Level 4B" folders. The samples I have found here and on other sites show only one folder name and not a heiarchy as I am trying to use. This is my first try at working with web sites and the Goole gsa. Any help would be appreciated.
Thanks,
John

[edited by: ThomasB at 10:55 pm (utc) on June 7, 2005]
[edit reason] examplified [/edit]

Dijkgraaf

10:47 pm on Jun 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Possibly the problem is the spaces in the folder names.
It may see only "Disallow: /JGT01/Level" which tells it not to index anything starting with this (which would include both folder).
Either
1) Try creating those folders without spaces
or
2) Replace those spaces in the folder names with %20 in the robots.txt
Disallow: /JGT01/Level%201/Level%202/Level%203/Level%204B/

John488

2:43 pm on Jun 10, 2005 (gmt 0)

10+ Year Member



Thank you Dijkgraaf.
I will try both approaches and post the results.

g1smd

8:11 pm on Jun 13, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Spaces and underscrores in URLs are always bad news. Avoid using them at all.

Hyphens and dots never cause problems.