Welcome to WebmasterWorld Guest from 184.108.40.206 , register , free tools , login , search , subscribe , help , library , announcements , recent posts , open posts Subscribe to WebmasterWorld
newbie robots.txt and forum Q how to write it buksida msg:1527245 8:34 am on Mar 17, 2004 (gmt 0) I've only just come across the robots.txt topics on this forum and am a real newbie with it so please excuse my questions if they sound dumb:
1) what do i put in the file?
2) where does it reside on the server (my guess is the root dir) 3) how can i make google spider my forum?
buksida msg:1527246 8:44 am on Mar 17, 2004 (gmt 0)
okay, answering my own questions now, a bit of research and I know the answer to 1 and 2 (although not quite sure on why certain robots should be disallowed)
I would still like the forum (phpBB) to be spidered though.
DoppyNL msg:1527247 8:47 am on Mar 17, 2004 (gmt 0)
1) Depends on what you want to do. So you should ask yourself first on what you want crawlers to do and what not to do on your site. details on what it might look like: www.robotstxt.org
robots.txt must be placed in the root of your domain.
you won't have to do a thing, once a crawler finds it, it will crawl it if it wants to. buksida msg:1527248 9:54 am on Mar 17, 2004 (gmt 0)
Thx for the quick reply, here what i've done:
created robots.txt containing this:
User-agent: * Disallow: /forum/posting.php Disallow: /forum/admin Disallow: /forum/images Disallow: /forum/privmsg.php Disallow: /forum/profile.php Disallow: /forum/memberlist.php
Is there anything else I need to add?
I've read some stuff about having to disable session IDs for *G* to effectively spider the forum pages. Does this need to be done?
trillianjedi msg:1527249 11:01 am on Mar 17, 2004 (gmt 0)
You'll need to mod_rewrite the URL's on phpBB so that google can crawl the pages easily.
There are lots of bots which you might want to block in your robots file. The only bots you really want to let in are the search engines whos indexes you want to be indexed in.
Anything else coming in is just a waste of your bandwidth and server load.
buksida msg:1527250 3:27 am on Mar 18, 2004 (gmt 0)
I have edited the sessions.php file in the forum as directed on another message board to make googlebot skip the session IDs. The forum still works now I'll have to wait until it starts getting spidered ... if it starts getting spidered! Cheers