Welcome to WebmasterWorld Guest from 54.159.111.156

Forum Moderators: goodroi

Message Too Old, No Replies

newbie robots.txt and forum Q

how to write it

     
8:34 am on Mar 17, 2004 (gmt 0)

10+ Year Member



I've only just come across the robots.txt topics on this forum and am a real newbie with it so please excuse my questions if they sound dumb:

1) what do i put in the file?
2) where does it reside on the server (my guess is the root dir)
3) how can i make google spider my forum?

Thanks

8:44 am on Mar 17, 2004 (gmt 0)

10+ Year Member



okay, answering my own questions now, a bit of research and I know the answer to 1 and 2 (although not quite sure on why certain robots should be disallowed)

I would still like the forum (phpBB) to be spidered though.

8:47 am on Mar 17, 2004 (gmt 0)

10+ Year Member



1)
Depends on what you want to do.
So you should ask yourself first on what you want crawlers to do and what not to do on your site.
details on what it might look like: www.robotstxt.org

2)
robots.txt must be placed in the root of your domain.

3)
you won't have to do a thing, once a crawler finds it, it will crawl it if it wants to.

9:54 am on Mar 17, 2004 (gmt 0)

10+ Year Member



Thx for the quick reply, here what i've done:

created robots.txt containing this:
User-agent: *
Disallow: /forum/posting.php
Disallow: /forum/admin
Disallow: /forum/images
Disallow: /forum/privmsg.php
Disallow: /forum/profile.php
Disallow: /forum/memberlist.php

Is there anything else I need to add?

I've read some stuff about having to disable session IDs for *G* to effectively spider the forum pages. Does this need to be done?

11:01 am on Mar 17, 2004 (gmt 0)

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Google will choke on Session ID's and it will cane your bandwidth, so you need to lose them. Use cookies instead for login user info.

You'll need to mod_rewrite the URL's on phpBB so that google can crawl the pages easily.

There are lots of bots which you might want to block in your robots file. The only bots you really want to let in are the search engines whos indexes you want to be indexed in.

Anything else coming in is just a waste of your bandwidth and server load.

TJ

3:27 am on Mar 18, 2004 (gmt 0)

10+ Year Member



I have edited the sessions.php file in the forum as directed on another message board to make googlebot skip the session IDs. The forum still works now I'll have to wait until it starts getting spidered ... if it starts getting spidered!
Cheers
 

Featured Threads

Hot Threads This Week

Hot Threads This Month