Forum Moderators: goodroi
I'm wondering how to disallow sub folders, for example i have 2 forums installed on the domain, they are same type, lets say i want to disallow as using the following:
User-agent: *
Disallow: /adm/
Disallow: /download.php
now the folder adm and file download.php are not on the root, they are inside folders for example:
adm path is
domainname.com/forum/adm
and the file download.php path is
domainname.com/forum/subfolder/download.php
using the robots.txt as above will work? or i should fill the full path? offcourse my robots.txt is at the root such as:
domainname.com/robots.txt
i want to keep it at root, and i have same type of forum installed 3 times and i want same folders and files to be disallowed for all the 3 forums.
so using :
User-agent: *
Disallow: /adm/
Disallow: /download.php
will disallow these folders and files , and will understand that they are not on the root automaticlly?
thanx in advance.
So,
Disallow: /adm/
prevents them from crawling "example.com/adm/<nothing or anything at all here>.
If that was the only Disallow line in the file, they would fetch /forum/adm/ but they would not fetch /adm/ or /adm/books or adm/books/fiction
Therefore, to Disallow fetching of /forum/adm/ and /forum2/adm/, you'd need
Disallow: /forum/adm/
Disallow: /forum2/adm/
However, Google and a few other major search engines support wild-cards. But if you use wild-cards, you cannot use them in a "User-agent: *" policy record, because this would confuse many other robots which do not support wild-cards. So, to support both advanced and simple robots, you'd have to use something like:
User-agent: Google
User-agent: Slurp
Disallow: /*/adm/
User-agent: *
Disallow: /forum/adm/
Disallow: /forum2/adm/
Since using the wild-cards may not make your robots.txt file shorter, it might be best to use the simplest robots.txt structure possible, and simply Disallow each of the /forum/adm/ subdirectories.
Jim
Thank you very much
U Provided me with all details, yes what u said i just discovered when i was trying the robots.txt tool in Google webmaster tools. i will work with ur advice and use the simple robots.txt , i appreciate ur help.
about the lenght of the robots.txt , it will affect any thing?
thanx again
Now i restricted the files and folders from being crawled, and all working fine, but the problem i started to see these errors in webmaster tools : URL restricted by robots.txt .
I have 17000 Errors!
I searched all sitemaps and removed all the links i restricted in robots.txt from them, but still Google give these errors for search.php and some other files.
so what is the reason ? i thought maybe i shall wait till Google crawl all sitemaps after i changed them, and removed the things i restricted, now all crawled and Google still produce such errors!
Any Idea!?
Thanx in advance.
- or -
B) Don't worry about it. Google is simply telling you that the pages you Disallowed in robots.txt *are* disallowed and cannot be crawled. So they are telling you that your robots.txt changes worked.
The only thing I'd look into is your search.php Disallows -- Either Disallow search.php completely, or make sure you've got the query string Disallow syntax right in your robots.txt file. It probably *is* right, but it's worth checking. You might want to take steps to prevent *all* search.php URL+query-string variations from being spidered -- Otherwise, it's easy to create an almost infinitely large number of "search" URLs.
Jim
Thanx very much for your replay, infact the search.php is a file in my phpbb forum, and im using : Disallow: /forum/search.php &
Disallow: /forum/search.php*
Logically google must only report an error if im telling to follow a link in the sitemaps, so it must tell i tried to follow the link u ordered me to follow but in the same time ur preventing me through ur robots.txt!
i downloaded all sitemaps since i changed robots file, and i searched with control+F for the word search and not included in any sitemap. that to make sure.
I believe what u said is correct, maybe it just need some time.
Here u can see my robots.txt < sorry, no personal links >
forum , community, how folders all are phpbb 3 forums.
Im dissallowing all not important contecnts since i was facing slow indexing rate, this step fixed the problem, now google regularly index my pages. but i still see these robots errors.
thanx again
[edited by: tedster at 8:40 pm (utc) on Aug. 30, 2008]
Google Webmaster Tools are not perfect. Google considers it an error when you Disallow *anything* because they want to crawl all of it.
Only "Errors for URLs in Sitemaps" are important in this case.
Stop changing your site/robots.txt/SiteMap.
Wait 3 months after the last change (do something else profitable while waiting) to let Google crawl the site, get new data and update your GWT report, then check again. :)
You can search Google in milliseconds. But crawling, ranking, updating Toolbar PageRank, and updating GWT reports can take months.
Jim