Forum Moderators: phranque

Message Too Old, No Replies

Google and phpBB forums and Robots.txt

         

Frank_Rizzo

11:13 pm on Jun 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After 4 months of waiting I finally get google to index a phpbb forum.

I'm worried though because google is not obeying the robots.txt. I have lots off disallows similar to the ones in:

[webmasterworld.com...]

but google is indexing profiles and other stuff I don't want it to!

Any reason why google is ignoring the robots.txt? It's been up there for months.

Jenstar

11:21 pm on Jun 24, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know there have been issues with Googlebot and phpbb boards because of the dynamic content. Did you go to phpbb.com and head to their community? There are a lot of discussions (as well as mods/hacks) on how to best handle Googlebot so it indexes what you want it to, and ignore what you do not want indexed. I know many people are having problems with phpbb and Googlebot, especially with the last couple of releases with the session IDs.

Frank_Rizzo

9:37 am on Jun 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To be honest, I don't think this is a phpbb problem. Google is not obeying robots.txt

I have non forum directories and files disallowed and google obeys it, but for the forum, google is indexing everything.

I double checked the paths and file names. They are all correct.

Is this some kind of patern matching problem? .e.g. robots.txt has

disallow: /forum/profile.php

so that google will not read that exact filename but it will read

/forum/profile.php?u=100

due to not being an exact match?

Jenstar

12:21 am on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To be honest, I don't think this is a phpbb problem. Google is not obeying robots.txt

It is probably related to the way phpbb outputs dynamic URLs, which is the problem.

When did you create your robots.txt? Googlebot does not grab it with every crawl. If it was added in the last month or so, you might need to wait until Googlebot decides to check your robots.txt file again.

rogerd

4:13 am on Jun 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



A couple of possibilities:

1) Due to session IDs, Google isn't recognizing the URL as one that is banned.

2) It was recently pointed out to me in another thread that Google will INDEX any URL that is linked to, even if it does not SPIDER it due to the robots.txt file.

Frank_Rizzo

4:45 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is probably related to the way phpbb outputs dynamic URLs, which is the problem

If that was the case then others would have the same problem and not just me I guess.

I'm sure I put the robots.txt file up there about 2 months ago when I applied the phpbb / google mod

(side note, will we be allowed to discuss phpbb when Best BBS is released? Being a competitor and such. )

Rogered

Google will INDEX any URL that is linked to, even if it does not SPIDER it due to the robots.txt file.

I don't understand that but I think you are right.

I have a .htaccess protected members area which is disallowed in the robots.txt and yet google is indexing the links even though it can't read them.

rogerd

5:13 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Frank, I tracked down jdMorgan's cogent explanation of this issue:

[webmasterworld.com...]

DaveAtIFG

5:28 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(side note, will we be allowed to discuss phpbb when Best BBS is released? Being a competitor and such. )

Moderator's note: This is still a help forum. Until that changes, technical help issuses can be discussed freely. "Is program A better than program B?" type discussions have been discouraged for a long time, that won't change. Invariably each program's author shows up to defend his product and the discussion becomes unproductive...

Frank_Rizzo

5:55 pm on Jun 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Rogerd, that link looks the solution. Many thanks.