Forum Moderators: phranque
I wonder why searchbots like googlebot only get the first page and do not crawl deeper.
access.log:
64.68.82.169 - - [28/Oct/2003:09:01:48 +0100] "GET /robots.txt HTTP/1.0" 200 81 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.169 - - [28/Oct/2003:09:01:57 +0100] "GET / HTTP/1.0" 200 3006 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.168 - - [28/Oct/2003:11:47:46 +0100] "GET /en/main/welcome HTTP/1.0" 200 3006 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
..and gone.
That's with other bots, too.
robots.txt valids ok.
I use Apache's mod_rewrite to make the page look static and do not use sessions (had this in the beginning - that's still in google's archive).
The site to be regarded is thequod.de [thequod.de]. From there you'll be redirected (with status 302) according to your language, probably to [thequod.de...] - and that's where the robots don't get deeper.
Is this because of absolute links in href (like "/en/main/othercategory") and bots will only climb down, not up?
Or is it the "DC.Identifier" meta tag that does not refer to the URL itself, but to the executing php script? (Just recognized that, but cannot imagine that this would prevent bots from crawling the rest).
Please have a look at this..
I recommend you run your pages through a bot simulator to see what the bots are getting when they access your site. That should show if there is anything in particular hanging them up. [searchengineworld.com...]
FYI, posting personal url's is against the TOS here.
Here's a good post to read for some good pointers in other areas for building a site: [webmasterworld.com...]