homepage Welcome to WebmasterWorld Guest from 54.226.0.225
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
newbie robots.txt and forum Q
how to write it
buksida




msg:1527245
 8:34 am on Mar 17, 2004 (gmt 0)

I've only just come across the robots.txt topics on this forum and am a real newbie with it so please excuse my questions if they sound dumb:

1) what do i put in the file?
2) where does it reside on the server (my guess is the root dir)
3) how can i make google spider my forum?

Thanks

 

buksida




msg:1527246
 8:44 am on Mar 17, 2004 (gmt 0)

okay, answering my own questions now, a bit of research and I know the answer to 1 and 2 (although not quite sure on why certain robots should be disallowed)

I would still like the forum (phpBB) to be spidered though.

DoppyNL




msg:1527247
 8:47 am on Mar 17, 2004 (gmt 0)

1)
Depends on what you want to do.
So you should ask yourself first on what you want crawlers to do and what not to do on your site.
details on what it might look like: www.robotstxt.org

2)
robots.txt must be placed in the root of your domain.

3)
you won't have to do a thing, once a crawler finds it, it will crawl it if it wants to.

buksida




msg:1527248
 9:54 am on Mar 17, 2004 (gmt 0)

Thx for the quick reply, here what i've done:

created robots.txt containing this:
User-agent: *
Disallow: /forum/posting.php
Disallow: /forum/admin
Disallow: /forum/images
Disallow: /forum/privmsg.php
Disallow: /forum/profile.php
Disallow: /forum/memberlist.php

Is there anything else I need to add?

I've read some stuff about having to disable session IDs for *G* to effectively spider the forum pages. Does this need to be done?

trillianjedi




msg:1527249
 11:01 am on Mar 17, 2004 (gmt 0)

Google will choke on Session ID's and it will cane your bandwidth, so you need to lose them. Use cookies instead for login user info.

You'll need to mod_rewrite the URL's on phpBB so that google can crawl the pages easily.

There are lots of bots which you might want to block in your robots file. The only bots you really want to let in are the search engines whos indexes you want to be indexed in.

Anything else coming in is just a waste of your bandwidth and server load.

TJ

buksida




msg:1527250
 3:27 am on Mar 18, 2004 (gmt 0)

I have edited the sessions.php file in the forum as directed on another message board to make googlebot skip the session IDs. The forum still works now I'll have to wait until it starts getting spidered ... if it starts getting spidered!
Cheers

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved