Staffa

msg:1529430 | 2:37 pm on Mar 30, 2004 (gmt 0) |
Can you not organize your set up so that all files open to robots are in one or more directories and that the files that you want blocked from being accessed are in separate directories? You can then just block the whole directory.
|
DoppyNL

msg:1529431 | 3:59 pm on Mar 30, 2004 (gmt 0) |
That's an idea, but it will remove the option of "going up in the path" to find the page where the file can be found. So I'd like to show in the URL the page where the file comes from. On top of that there will be a security check in place to determine if the user is allowed to download the file; this is related to the page. If I were to change the url-structure, I would also have to pass something along to determine what page I have to check with. Also, the problem isn't creating the robots.txt file, I can do that automaticly in PHP. Problem is, will search engine's cope with large robots.txt files?
|
DoppyNL

msg:1529432 | 11:37 am on Apr 8, 2004 (gmt 0) |
bump to top. question I still have: How large can the robots.txt file be? When do crawlers start having problems with it because of it's size?
|
bufferzone

msg:1529433 | 12:02 pm on Apr 8, 2004 (gmt 0) |
I would think that the same principles for normal web pages apply. I read an answer by GoogleGuy that you should not go over 15K, under 12K is fine and if you can keep it under 10K it is best. If you look at Brett’s robots.txt her at webmasterworld you will see that it is LLLLLLLLong, How many K’s I haven’t checked. My guess would be that if you keep it shorter then Brett’s you should be fine
|
the_nerd

msg:1529434 | 7:40 pm on May 21, 2004 (gmt 0) |
Looking at WWs robots-file I ask myself (and now you, because my answers weren't good) why not change the standard so it can handle includes as well. We could then block everything and let in what we want. Might be unfair for new SEs, but could keep out a couple of suckers that bring down our sites.
|
sidyadav

msg:1529435 | 9:43 pm on May 21, 2004 (gmt 0) |
> We could then block everything and let in what we want. > Might be unfair for new SEs, but could keep out a couple > of suckers that bring down our sites. Yeah but DoppyNL's purpose for using a robots.txt is way different. Instead of blocking certain robots, DoppyNL wants to block all the robots from certain prohibited web pages. In this case, I would strongly suggest that you place the " <meta name="robots" content="noindex,follow">" into all of your HTML pages that you want to prohibit robots from visiting. It's not worth taking the risk of creating a large robots.txt file, but then again, putting those tags into your prohibited web pages wouldn't be easy either - but once it's done, it's done. Sid
|
|