homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
My PHP forum
Help
Glovebox




msg:215790
 10:16 pm on Jul 25, 2003 (gmt 0)

Hello everyone I've been lurking around these parts now for a few weeks, and only just built up the courage to post. Hope you'll be able to help me.

Only my frontpage is currently listed with google, my question is, will google spider php sites? If not, why not? :-)
All I ever get on the logs is that its getting the main page as,
"GET / HTTP/1.0" 200 46011 "-" "Googlebot/2.1 +http://www.googlebot.com/bot.html)"
And getting the robots.txt

<snip>

Thanks a lot (hope that made sense)
Adam

[edited by: Marcia at 12:12 am (utc) on July 26, 2003]
[edit reason] No individual site checks, please. [/edit]

 

GrinninGordon




msg:215791
 12:48 am on Jul 26, 2003 (gmt 0)

Glovebox

The problem may be one of several things.

1) If the site is new, it may be too early for other pages to get spidered.

2) If the PR is low, Google may not want to spider the other pages.

3) If you use session codes in the URLs, Google may decide not to spider other pages in case it is a dead end and its bot gets stuck / lost and does not come home for dinner.

4) Some php driven forums are not that cleverly designed SEO wise. Some use the damn same title for each of its dynamic pages. If Googlebot sees same-same, it says no-no.

My suggestion is;

1) Get someone to move any on page JS into separate files.
2) Get rid of any session codes
3) Find out a way to edit the Title (and other meta areas, although this is not anywhere near as important) of the dynamic pages (so, perhaps, it returns the subject of the posting as the title and heading).
4) Make sure you cross link as much as possible your internal pages.

kamikaze Optimizer




msg:215792
 12:57 am on Jul 26, 2003 (gmt 0)

Glovebox:

You said it is a PHP forum?

There is code that can be put into your HTaccess file along with files you can put into your root that will convert all of your PHP files to HTML just for GoogleBot when it comes to the site, and it makes all pages appear just one level down under the index, not buried.

I use it on a site with thousands of posts and GoogleBot loves the site. The results are wonderful.

Sticky me if you want more info.

Gus_R




msg:215793
 5:27 pm on Jul 26, 2003 (gmt 0)

Hi, approach to a static linking structure as possible, avoid link pages in forms or js.

Gus

vincevincevince




msg:215794
 5:29 pm on Jul 26, 2003 (gmt 0)

silly question .... but i presume you DID check the robots.txt is valid?

Glovebox




msg:215795
 9:43 pm on Jul 26, 2003 (gmt 0)

silly question .... but i presume you DID check the robots.txt is valid?
---------------------------------------------------------
Yeah, I checked its fine.

What I did do, was added the following code to sessions.php:

global $SID, $HTTP_SERVER_VARS;

if (!empty($SID) &&!eregi('sid=', $url) &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

Apparently this will stop google using sessions. Unfortunately googlebot hasnt been back since I added this, so I guess its just sit back and wait.

Thanks for you advice guys.

Adam

Stujoe




msg:215796
 9:47 pm on Jul 26, 2003 (gmt 0)

PHPBB, I presume?

I added that code a month or so ago and google began to like my site quite a bit again. It will work.

Yidaki




msg:215797
 9:52 pm on Jul 26, 2003 (gmt 0)

Glovebox,

your using the phpbb forums software and should find a lot of infos at their support forums - it's been discussed there, believe me. ;)

If you use the proper code the reason for your trouble must be another.
This is the proper code you should use:
global $SID, $HTTP_SERVER_VARS;

if (!empty($SID) &&!eregi('sid=', $url) &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'FAST-WebCrawler') &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Slurp@inktomi') &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Scooter') &&!strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'MicrosoftPrototypeCrawler'))
{
$url .= ( ( strpos($url, '?')!= false )? ( ( $non_html_amp )? '&' : '&amp;' ) : '?' ) . $SID;
}


You can add as many USER-AGENTs as you want.

You should also check the recent posts about session id's and google [webmasterworld.com] since there are many statements that help to reduce any paranoia. ;)

<edit>Ohhps :^</edit>

Glovebox




msg:215798
 9:31 pm on Sep 4, 2003 (gmt 0)

Just a quick update and thanks to you all for you your help. My site now has over 2,000 pages listed in google and ranks number 1 in the serp's for a popular related keyword. This is due mainly to disabling sessions for bots by using the code shown above by myself and Yidaki.

Thanks Everyone!
Adam

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved