Forum Moderators: Robert Charlton & goodroi
Having been a keen lurker of this forum for some time now, I thought it was time to register and say hello. I've read numerous posts from this site which have helped me so far.
I am currently running a number of sites, one of which <edited> is based on PHPBB2 Plus. The site is about 3 months old but has had numerous facelifts to get it to its current state.
The site itself is going well and I have done the following to help with Google.
1)Added a sitemap which dynamically adjusts depending on forum content ( http://www.example.com/_sitemap.xml ) and added it to Google sitemaps. All seems to have validated correctly etc.
2)Used URL rewriting to show all forums / posts as .html
3)Setup robots.txt to disallow all non-necessary subfolders and files
I see from my logs that I am getting hit as follows by Google and this is happening daily (Aug 06 examples below):
06 Aug 2006 05:02:13 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /
06 Aug 2006 19:51:03 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /fpost82.html
06 Aug 2006 19:51:26 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /fpost166.html
06 Aug 2006 19:51:39 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /fpost193.html
06 Aug 2006 19:51:41 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /ptopic37.html
06 Aug 2006 19:52:07 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /ftopic35.html
06 Aug 2006 19:52:08 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /ftopic46.html
06 Aug 2006 19:52:21 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 66.249.72.1 /ftopic50.html
Yet when I visit www.google.co.uk I only see two pages, http://www.example.com/ and http://www.example.com/index.php
I seem to recall that during the transition of the site, other pages did start to appear but then vanished (probably due to 404s and changing content etc).
I have this morning made a slight change to the robots.txt file to disallow /*.php$ seeing as the only content I want indexing is the .html rewrites.
I just wondered if anyone could throw any light on the issues I'm seeing. From other posts, its fairly obvious that deeper crawling can take a while, as long as I haven't done anything silly, I';m happy to sit back and wait, but I just wanted another opinion.
Thanks
s00tie
<Sorry, no specific urls.
See Forum Charter [webmasterworld.com]>
[edited by: tedster at 2:47 pm (utc) on Aug. 9, 2006]
although googlebot is visiting your pages, it may take a while before they are actually included in the index.
it is a subject of great debate here as to how long it takes for pages to appear in the index, especially pages deeper than the home page.
my own experience tells me that some pages are included very quickly others seem to take for ever.