Welcome to WebmasterWorld Guest from 54.167.175.157

Forum Moderators: phranque

Google and phpBB forums

How to index only the important pages

   
5:38 pm on Feb 1, 2003 (gmt 0)

10+ Year Member



Anyone here using robots.txt to prevent Google from indexing many of the irrelevant links on phpBB forums? I've added the following section to my robots.txt in the root of my domain:

User-agent: *
Disallow: /images/
Disallow: /forums/admin/
Disallow: /forums/db/
Disallow: /forums/includes/
Disallow: /forums/language/
Disallow: /forums/templates/
Disallow: /forums/custom.php
Disallow: /forums/config.php
Disallow: /forums/groupcp.php
Disallow: /forums/login.php
Disallow: /forums/modcp.php
Disallow: /forums/posting.php
Disallow: /forums/printview.php
Disallow: /forums/privmsg.php
Disallow: /forums/profile.php
Disallow: /forums/search.php
Disallow: /forums/viewonline.php

My aim is to ask Google to index only the relevant topics - all the extra links that GoogleBot normally follows then get excluded, such as the print view, reply pages, private messages, logons, etc. I've also made a number of the forums on my site registered members only to keep out search engines, with the overall strategy of getting more of the topics from the important forums indexed. I only have so much PR to spread around in order to get these topics/threads indexed.

I'd like to hear if anyone else is doing something along the above lines, and if there are any good ideas out there that could be useful to myself (and anyone else!).

Interestingly, the freshbot hasn't started obeying this robots.txt file yet, in spite of having requested it for the last week or so.

By the way, if you're using phpBB, it's important to do the Google mod that is listed on phpbb.com in order to prevent phpBB from assigning session IDs to GoogleBot. Mods, I hope it's OK to post this link seeing as it's non-commercial, GPL software, and a really useful page for those using phpBB forums:
[Tutorial] Google & phpBB [phpbb.com]

3:48 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hi,

i have been putting off doing the google mod on our forum (laziness/time) but it (the forum) has become a really good info resource on general stuff related to our site and it would be great to have google spidering it.

the robots text makes perfect sense to me. i'll let you know what happens.

6:28 pm on Feb 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



KakenBetaal,

- yes, your robots.txt looks quite the same like mine ... :)
- yes, the phpbb google session mod is very important!

I'm currently working on a rewrite mod to let the php pages look like .htm and to avoid query parameters in the urls. The offered (not yet official) mods at phpbb.com are really buggy and very confusing ... hopefully they make it a standard for the next release.

<added>Kaken, you don't need the .php suffixes in your robots.txt. It works without them - and i feel more safe without!</added>

12:34 pm on Feb 7, 2003 (gmt 0)

10+ Year Member



Thanks, guys! I've used only a very few of the official mods, and have done the following to my forums to improve the community/spidering:

Google spidering mod for phpBB - no sids for Google and inktomi
Special rank image mod to allow post-count-rank images to still work for admins and moderators.
Added a site nav bar top and bottom to improve navigation over the whole site, and to siphon away any PR for posted links on the forum
Added print topic page
Added some forum disallows to robots.txt to stop Google indexing repeated content.
tell a friend script
Removed {SITENAME} :: from overall_header.tpl for better SEO titles.
Removed www, YIM, ICQ and other buttons from the main topic display

GoogleBot ain't paying too much attention to the robots.txt yet. Maybe by the next deepcrawl - I'm guessing there is some kind of delay between picking it up and actually obeying it in the case of a long-established site.

1:32 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hi kaken,

i don't know whether you have added it, but my favourite mod has been the "view posts since last visit (number)"

it is the first thing most of our users click on when they log into the forums. shows at a glance whether or not the forums have been busy, instead of just giving the link "view posts since last visit"

regards

2:54 pm on Feb 7, 2003 (gmt 0)

10+ Year Member



I haven't tried that mod - sounds like it might be worthwhile. Have you done much detailed analysis of visitors and their travels through your forums?
4:17 pm on Feb 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hardly at all

i've got most of my feedback by posting topics asking for opinions and changes wanted, etc - gives me a nice hands on feel, and i was pleasantly surprised at the response.

in fact stat analysis is something i should give more attention to, as the forums are one of the most popular items on our site. we use nettracker - which gives great results, but it takes a loooong time to wade through these :)

cheers

 

Featured Threads

Hot Threads This Week

Hot Threads This Month