Forum Moderators: phranque

Message Too Old, No Replies

why do crawlers only get first page and go? (302 redirect)

         

blueyed

3:53 am on Nov 4, 2003 (gmt 0)

10+ Year Member



Glad to be in the forums, as I found a lot answers here while asking google my php questions.. :)

I wonder why searchbots like googlebot only get the first page and do not crawl deeper.

access.log:
64.68.82.169 - - [28/Oct/2003:09:01:48 +0100] "GET /robots.txt HTTP/1.0" 200 81 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.169 - - [28/Oct/2003:09:01:57 +0100] "GET / HTTP/1.0" 200 3006 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.168 - - [28/Oct/2003:11:47:46 +0100] "GET /en/main/welcome HTTP/1.0" 200 3006 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

..and gone.
That's with other bots, too.

robots.txt valids ok.

I use Apache's mod_rewrite to make the page look static and do not use sessions (had this in the beginning - that's still in google's archive).

The site to be regarded is thequod.de [thequod.de]. From there you'll be redirected (with status 302) according to your language, probably to [thequod.de...] - and that's where the robots don't get deeper.

Is this because of absolute links in href (like "/en/main/othercategory") and bots will only climb down, not up?

Or is it the "DC.Identifier" meta tag that does not refer to the URL itself, but to the executing php script? (Just recognized that, but cannot imagine that this would prevent bots from crawling the rest).

Please have a look at this..

BlueSky

9:07 am on Nov 4, 2003 (gmt 0)

10+ Year Member



Welcome to WebmasterWorld.

I recommend you run your pages through a bot simulator to see what the bots are getting when they access your site. That should show if there is anything in particular hanging them up. [searchengineworld.com...]

FYI, posting personal url's is against the TOS here.

blueyed

12:30 pm on Nov 4, 2003 (gmt 0)

10+ Year Member



The slim spider you proposed runs fine and finds all links that should go deeper.
I cannot see anything that should hang them up.

BlueSky, see private message regarding TOS.

BlueSky

2:19 pm on Nov 4, 2003 (gmt 0)

10+ Year Member



If the bot simulator had no problems accessing the links then that's great. Your robots.txt is okay. I think they may not be going deeper because your site lacks content. Take away whatever is the same on each page and what do you see left? Not much. You have to feed the bots words and get some incoming links. They know your site is there, so give them more to chomp on.

Here's a good post to read for some good pointers in other areas for building a site: [webmasterworld.com...]

blueyed

3:37 pm on Nov 4, 2003 (gmt 0)

10+ Year Member



You're right about lacking content, of course. But how should a spider know this, if he only sees the first page? If he would go to my comp section he would get some content..

For now, I'll repair the DC.Identifier meta tag. But I don't think that's the issue.