Googlebot behaviour

Forum Moderators: open

Message Too Old, No Replies

Googlebot behaviour

Gogglebot not visiting dynamically generated pages

MJones

10:22 am on Oct 28, 2002 (gmt 0)

Hi all

just a quick Google question. I'm hoping (fingers crossed) that I'll get listed in Google after the next update and have been keeping an eye on the server logs to watch out for visiting googlebots. My site has been visited a few times but only as far as the first page, the spider never follows any of the links. All the links contain a? as the pages are dynamically generated and are of the form :

/shop/?page=pagename

rather than :

/shop/index.php?page=pagename

would this prevent the spider following the links? Also I do not have any robot specific meta tags such as <meta name="robots" content="index,follow">, will this make a difference? I understood that these meta tags are still not used by the majority of search engines.

Any thoughts/advice appreciated.

Thanks.

thunderpaste

10:24 am on Oct 28, 2002 (gmt 0)

If you are talking about a new site it is possible Google will only index a small part until you have some incoming links.

creative craig

10:28 am on Oct 28, 2002 (gmt 0)

Google will still look at the meta tag though if it is there. It is only one line of code and I would put it in, it may make all the difference.

Of course if there are no links pointing at your site at the moment, I would work on that as well as that is a major factor in a good listing in the big G!

Craig

MJones

10:30 am on Oct 28, 2002 (gmt 0)

Yep it is a new site. I do have a few incomming links but nothing to write home about. However, I just wondered if I was doing anything to preclude the google spider? Especially as the next update is imminent.

Grumpus

11:54 am on Oct 28, 2002 (gmt 0)

Nah, it's pretty normal for "off crawl" times like this for new sites to get tagged a couple of times by the freshbot, but it rarely looks very deeply into it, in my experience (regardless of how your do your URL setup). I'm using ASP and my /directory/?ID=X type urls seem to work fine.

HarryM

12:04 pm on Oct 28, 2002 (gmt 0)

For what it's worth, I too have a new site, and currently I only have 2 incoming links, neither very significant. Yet all my pages were crawled by Google earlier this month.

Now my nerves are frazzled waiting for the update. :)

nell

12:05 pm on Oct 28, 2002 (gmt 0)

I've had problems getting .php pages ranked high with Google. I wound up copying the source code from selected pages (after PHP generated them) and made them into .htm pages.
My menu, when it used links to .php pages, also seemed to be ignored. (I did get some .php pages into Google but they were buried.)

MJones

1:21 pm on Oct 28, 2002 (gmt 0)

Thanks for the replies! I've added the robot meta tags just in case this has any bearing and think that perhaps I just need to be patient. Although that is a hard thing to be at the moment!

warumauchnicht

1:27 pm on Oct 28, 2002 (gmt 0)

@MJones

I've had the same problem a month ago. Google has crawled only www.mydomain.com/ and didn't follow any link. Although I've installed a robots.txt and META-Tags.

But now my page is completly spidered. The trick:
Mod_Rewrite in .htaccess. I mean something like this:

URL: www.shop.com/shop/page/firstpage/
Rewrite to www.shop.com/index.php?page=firstpage

If it's possible for you to create your own .htaccess-File I am able to help you with the syntax.

regards,
tino

bjseiler

1:48 pm on Oct 28, 2002 (gmt 0)

If you do not want to deal with mod rewrite, there is an easy way to create html pages from php pages. If your content changes frequently, you may want to do this a couple times a month or put it into a cron script to do it for you.

From telnet or SSH -

>lynx -source [yoursitehere.com...] > index.html

(all on one line)

That will create index.html which will be the exact html output that you get when you click on index.php.

ruserious

2:23 pm on Oct 28, 2002 (gmt 0)

Google does not know the difference between dynamically generated (.php) or static (.html) files. And it would not care either. A filename extension can easily be manipulated through rewriting.

The problem with .php is that often these sites have some kind of usertracking via sessions. And if Session IDs get appended to get-urls google will not crawl those links.

See
[searchengineworld.com...] - Q: [Indexing] Does Google index dynamic content?

and

[webmasterworld.com...] - for the discussion

MJones

2:26 pm on Oct 28, 2002 (gmt 0)

> warumauchnicht

unfortunately my ISP doesn't allow the use of Mod_Rewrite in .htaccess files. This leaves me with the option of implementing my own URL parsing trick so that I can use links without the?, or generate a load of static pages that will be spidered (as suggested by bjseiler). However, I thought this was a technique that could lead to sites being blacklisted by some search engines.

The reason I've not gone down either of these routes yet is that from everything I've read so far I understood that Google was quite happy to follow dynamic links. This issue is obviously not as clear cut as it first appears and perhaps it's time to look at my options again.