homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
Google and phpBB forums
How to index only the important pages
KakenBetaal




msg:670018
 5:38 pm on Feb 1, 2003 (gmt 0)

Anyone here using robots.txt to prevent Google from indexing many of the irrelevant links on phpBB forums? I've added the following section to my robots.txt in the root of my domain:

User-agent: *
Disallow: /images/
Disallow: /forums/admin/
Disallow: /forums/db/
Disallow: /forums/includes/
Disallow: /forums/language/
Disallow: /forums/templates/
Disallow: /forums/custom.php
Disallow: /forums/config.php
Disallow: /forums/groupcp.php
Disallow: /forums/login.php
Disallow: /forums/modcp.php
Disallow: /forums/posting.php
Disallow: /forums/printview.php
Disallow: /forums/privmsg.php
Disallow: /forums/profile.php
Disallow: /forums/search.php
Disallow: /forums/viewonline.php

My aim is to ask Google to index only the relevant topics - all the extra links that GoogleBot normally follows then get excluded, such as the print view, reply pages, private messages, logons, etc. I've also made a number of the forums on my site registered members only to keep out search engines, with the overall strategy of getting more of the topics from the important forums indexed. I only have so much PR to spread around in order to get these topics/threads indexed.

I'd like to hear if anyone else is doing something along the above lines, and if there are any good ideas out there that could be useful to myself (and anyone else!).

Interestingly, the freshbot hasn't started obeying this robots.txt file yet, in spite of having requested it for the last week or so.

By the way, if you're using phpBB, it's important to do the Google mod that is listed on phpbb.com in order to prevent phpBB from assigning session IDs to GoogleBot. Mods, I hope it's OK to post this link seeing as it's non-commercial, GPL software, and a really useful page for those using phpBB forums:
[Tutorial] Google & phpBB [phpbb.com]

 

jamie




msg:670019
 3:48 pm on Feb 6, 2003 (gmt 0)

hi,

i have been putting off doing the google mod on our forum (laziness/time) but it (the forum) has become a really good info resource on general stuff related to our site and it would be great to have google spidering it.

the robots text makes perfect sense to me. i'll let you know what happens.

Yidaki




msg:670020
 6:28 pm on Feb 6, 2003 (gmt 0)

KakenBetaal,

- yes, your robots.txt looks quite the same like mine ... :)
- yes, the phpbb google session mod is very important!

I'm currently working on a rewrite mod to let the php pages look like .htm and to avoid query parameters in the urls. The offered (not yet official) mods at phpbb.com are really buggy and very confusing ... hopefully they make it a standard for the next release.

<added>Kaken, you don't need the .php suffixes in your robots.txt. It works without them - and i feel more safe without!</added>

KakenBetaal




msg:670021
 12:34 pm on Feb 7, 2003 (gmt 0)

Thanks, guys! I've used only a very few of the official mods, and have done the following to my forums to improve the community/spidering:

Google spidering mod for phpBB - no sids for Google and inktomi
Special rank image mod to allow post-count-rank images to still work for admins and moderators.
Added a site nav bar top and bottom to improve navigation over the whole site, and to siphon away any PR for posted links on the forum
Added print topic page
Added some forum disallows to robots.txt to stop Google indexing repeated content.
tell a friend script
Removed {SITENAME} :: from overall_header.tpl for better SEO titles.
Removed www, YIM, ICQ and other buttons from the main topic display

GoogleBot ain't paying too much attention to the robots.txt yet. Maybe by the next deepcrawl - I'm guessing there is some kind of delay between picking it up and actually obeying it in the case of a long-established site.

jamie




msg:670022
 1:32 pm on Feb 7, 2003 (gmt 0)

hi kaken,

i don't know whether you have added it, but my favourite mod has been the "view posts since last visit (number)"

it is the first thing most of our users click on when they log into the forums. shows at a glance whether or not the forums have been busy, instead of just giving the link "view posts since last visit"

regards

KakenBetaal




msg:670023
 2:54 pm on Feb 7, 2003 (gmt 0)

I haven't tried that mod - sounds like it might be worthwhile. Have you done much detailed analysis of visitors and their travels through your forums?

jamie




msg:670024
 4:17 pm on Feb 7, 2003 (gmt 0)

hardly at all

i've got most of my feedback by posting topics asking for opinions and changes wanted, etc - gives me a nice hands on feel, and i was pleasantly surprised at the response.

in fact stat analysis is something i should give more attention to, as the forums are one of the most popular items on our site. we use nettracker - which gives great results, but it takes a loooong time to wade through these :)

cheers

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved