Welcome to WebmasterWorld Guest from 54.167.46.29

Forum Moderators: open

Message Too Old, No Replies

Googlebot not crawling

Seeks index page, then leaves

     
6:12 am on Feb 12, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 3, 2001
posts:424
votes: 0


Googlebot visits often. It requests the index page, but doesn't crawl any deeper. This happens two or three times a day.

The MediaBot crawls deeper into the site without issue. The site runs AdSense.

Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.

Any ideas?

2:26 am on Mar 4, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 2, 2002
posts:87
votes: 0


I personally have to agrree with MetaGod. From my experience since Austin here's what's going on with me. Googlebot comes, grabs 20-30 pages everyday (pages that were up pre austin) Since Austin I have added lets say another 20-30 pages, all linked internally, in one way, shape, or form, but these new pages dont get crawled. I suspected the same thing as MetaGod, so I got some external links to some of these "deeper" internal pages, and bingo.

So, umm yeah I agree.
Just my 2 cents tho.

-phish

2:40 am on Mar 4, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 3, 2001
posts:424
votes: 0


With respect, we are off-topic.

I have access to numerous sites that demonstrate the opposite of what you and metagod are saying. So whatever you're seeing isn't universal.

What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again. It repeats this behaviour a few times a day on one particular site.

I see from the thread that other people have also noticed this, which leads me to believe it isn't a technical problem with the site itself. The fact that Googlebot crawled the site (eventually) pretty much confirms it.

Obviously the more links you have to deep content, the better. That fact is obvious, but it is not the topic of this thread.

[edited by: feeder at 2:58 am (utc) on Mar. 4, 2004]

2:58 am on Mar 4, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


To add another perspective: Our site is getting crawled well daily and showing frequently updated freshtags on many of the PR5 pages.

That being said, I've noticed that the pages added over the last week haven't been hit yet although they're all linked from several pages that are visited daily by the bot.

The only point of my post is to say, "Don't Panic". There has been a bit of a lull recently but this will change.

3:04 am on Mar 4, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 2, 2002
posts:87
votes: 0


Okie, I was just agreeing what he was saying is all.
So back to the "topic"...I also have about 30 or so sets of logs at my fingertips, and the problem your referencing, at least on the sites I deal with in the commercial market, I have attributed to whatever these new filters or algo changes are, that are dropping sites for certain keywords. Example: one site I watch has pr7, gets crawled deep daily, (100 pages or so a day) then it's dropped for its main keyword to like #500. During this time that it's dropped, GBot comes, grabs robots and index, then jets till the next day. This happens for about 4 weeks, same thing every day. All of a sudden algo tweak puts the site back at #3 where the site has been for 3 years, what happens next? You guessed it Gbot goes back to crawling all 100 pages a day everyday. So anyway I can only speak for myself ,but hey, this is what I'm seeing in black n white. Why? I dunno.

-phish

3:07 am on Mar 4, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 16, 2003
posts:746
votes: 0


I just uploaded a brand new site and have requested two links. One is from DMOZ in a Cat that seems to have an editor.

We'll see what happens. I bet googlebot will visit the day after the link goes live.

Hey, who cares about Google! I've just made Preferred Member. So who's going to offer me a beer?

Hey me too!

3:13 am on Mar 4, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 3, 2001
posts:424
votes: 0


Example: one site I watch has pr7, gets crawled deep daily, (100 pages or so a day) then it's dropped for its main keyword to like #500. During this time that it's dropped, GBot comes, grabs robots and index, then jets till the next day. This happens for about 4 weeks, same thing every day. All of a sudden algo tweak puts the site back at #3 where the site has been for 3 years, what happens next?

Thanks. That's what I'm interested in :)
My site has very strong multiple inbound linking (PR7 and 6) at different levels.

There's been a change in crawl activity of late, so I can pretty much discount on-site technical problems.

8:47 am on Mar 4, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


New preferred members BUY the beer.

Glad to have cleared that up for you guys.

11:59 am on Mar 4, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 21, 2002
posts:1051
votes: 0


What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again

I feel sure that Google has some sort of trigger that controls whether a site is deep crawled or not. This is related to PR. For a new site the trigger may be set because Googlebot discovered the site by following a high value link, although the visible PR in the toolbar may still be PR0.

Without this trigger Googlebot will crawl deeper only if it finds the index page has changed, or it has followed a new deep link. But it stops after a few pages if it finds nothing else is new. Also if nothing ever changes, Googlebot stops crawling and pages start losing there snippets and may eventually disappear from the index.

If the trigger is set, and stays set due to high PR/high value inbound links, then even the most obsolete site will get crawled regularly and stay in the index.

Adding new pages to a site does nothing unless Google stumbles over them. One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two.

Another way to encourage Googlebot is to freshen the whole site by making a small change on every page - easy with PHP.

I don't know whether this is a true rendition of what really happens, but it's the model that works for me.

8:18 pm on Mar 5, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 5, 2004
posts:3
votes: 0


Hello,
I am getting the same thing on my site. I submitted to Google about 6-8 weeks ago. It does the same thing everytime it comes to my site; goes to robots.txt then to the home pages and then leaves. It does this process, sometimes multiple times per day.

ed

9:52 pm on Mar 5, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


How many times a day do you change your homepage?
11:18 pm on Mar 5, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 24, 2000
posts:1747
votes: 4


I am experiencing the same thing on a newer site.
The site sat for a month, and googlebot would hit the homepage and exit.

The homepage would appear and disappear in google.
I moved the site to a different server thinking it could be a server configuration problem.

Googlebot continued to hit the homepage and exit.
After another month of this, I have now dumped a test site on the domain that I know has been indexed in the past to see if it is something internally wrong with the site.

The site has some good PR links pointing to it, that it the past should have given it a PR6.

In the past the same type of site with similar links would be fully indexed with a week.

This is not the only site I know of that is experiencing the same.

M i n n a p p l e

3:50 am on Mar 6, 2004 (gmt 0)

New User

10+ Year Member

joined:Mar 5, 2004
posts:3
votes: 0


Can't say that I change my home page every day, much less multiple times per day. Should I be?

Ed

4:07 am on Mar 6, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 4, 2002
posts:1687
votes: 0


No.

Added: Sorry, welcome to WW bnmwebmaster. Changes to pages are good but not as important, perhaps, as adding brand new pages to the site, with lots of pertinent content, instead. Tweaking an index page often won't substitute for the regular adding of more content, on more pages, that link back to the main page/s.

6:51 am on Mar 6, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 22, 2003
posts:1483
votes: 0


Confirmed. Googlebot is exhibiting pre-FL traits with regard to indexing newer sites beyond the home page. How quickly we forget how it used to be...

Must also confirm a surprising mackdaddy, CNN-fresh, status on established sites. Not crawling?!

9:58 am on Mar 6, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
posts:4652
votes: 0


"Should I be?"

Definitely if you are not getting the rest of the site crawled! You have to encourage the bot to go further, and a way to do that is change the page. Multiple times a day is an exaggeration, but with the bot coming multiple times a day it sure makes sense to change it every day. Right now it comes and sees nothing has changed so it goes away. Perfectly reasonable reaction. If it sees changes it will dig deeper sooner.

Don't bore Googlebot with the same old reruns. It'll be happy with new programs to check out each visit.

10:49 am on Mar 6, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Feb 17, 2004
posts:261
votes: 0


I've had much better crawls since I uploaded a page with a great big jpeg of a scantily clad Googlebot. Googlebot now visits every day :)
2:47 pm on Mar 6, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 16, 2003
posts:746
votes: 0


We all focus a lot of energy and analysis on the search results. It looks like there has been some sort of a tweak to the crawling algo.

It seems like we're having good success getting the home page crawled and indexed. Perhaps google is doing some maintenance/tweaks to the crawler and they have limited resouces at the moment. The crawl is probably priority based for the moment.

For those who are having problems is it freshbot that keeps visiting everyday (or is freshbot so 2002?)

3:03 pm on Mar 6, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Feb 17, 2004
posts:261
votes: 0


It's unfashionable to state this - particularly on this forum - but Google isn't immune from technical problems. Heck - we have just been through 4 months of them :)
4:35 pm on Mar 6, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 20, 2004
posts:45
votes: 0


I have a question about Googlebot. Today I checked the logs and finally saw "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
the Ip is 64.68.82.168
My question is this. They only hit the main page but nothing else. I have a PR3 but most of my links that I have acquired are not registering as yet. What does this mean? Will they come back? My other point is that before they made all their changes I was from #1-#10 on any search words remotely related to my site. At that time I had no document tag, no content tag, no meta tags, and the site had not been optomized because I didn't know about any of this. Now I have corrected all errors and every page validates with W3C Validator and I am not found. Can anyone explain this? I just don't get it. I have gotten a lot of links from sites that now have me on their site as a link (vacation type sites) and they are all related to my industry but when I did a link check I find that one of the links showing is something I did not request a link for and it is only somewhat related in that it more or less is a search site for every type industry. I have read this board until I am almost blind (LOL) and I still don't get it. Sorry!
7:58 pm on Mar 6, 2004 (gmt 0)

Junior Member

10+ Year Member

joined:May 24, 2002
posts:75
votes: 0


Feeder,

I have over 100+ websites, most of them with PR5. These sites are updated regularly and new pages added every now and then.

This time, since Google began its new crawl with those dates appearing beneath the URL's; none of my sites have been crawled as such. Althought the stats shows Googlebot but the cache of the homepages is still the same.

I have also added 1000's of backward links to some of these sites and they show no sign on improvement since googlebot is not deep crawling those sites too ( I think),

By all means, Googlebot is simple ignoring many of those sites, including ours. Dont seem to be server error at our end but at Google's end ofcourse.

We have done all the thinking and still blank on this. www2 and www3 shows new search results but with most of the same sites as www i.e google has not visited the fresh sites.

Some of the SERPS are illogical with no decent sites with any proper optimization or backward links. Google is sleeping or maybe we are in for a major update, my friend.

Lets keep our fingers crossed.

All the Best

12:30 pm on Mar 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Jan 24, 2005
posts:7
votes: 0


I have the following lines in my log file and the same problem which was discussed under this topic
<------------------------
64.68.82.57 - - [08/Mar/2004:10:39:19 +0000] "GET / HTTP/1.0" 200 31393 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html
------------------------>
Google have indexed only the main page of my website. and it is about 2 weeks it comes and grabs only the index page and robots.txt. Nothing more. This website is new and doesn't have a page rank at all. But as far as I know page rank is not considered at this point. All my pages are dynamical and most of them consists of one variable (ex. index.php?ln=en). There should be no problem. Now Let's take a look back at these lines up there. After (200) there are 5 numbers. What do they mean? After each visit of googlebot they change. Is it some kind of googlebot identificator or smth? Do you have any advices how to act in this situation. Should I wait for a few month for googlebot to index all of my pages?! Please take a look beadsky.com maybe you will know what to do :(
12:53 pm on Mar 8, 2004 (gmt 0)

Senior Member from FR 

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Feb 15, 2004
posts:6717
votes: 230


I've got "gbot 2" coming by twice a day since 10 days now ..eating up my robots text file and every page I've got each time it comes ....love my logs!...still doesn't do any good at all for indexing tho
..its only showing "index.html" page as the "result page" for any search on any keyword or phrase of mine that I do.
..last visit yesterday evening
..all pages spidered ( all content is different and 50% of pages are in another language ..all pages have different titles , text, alts , metas , pics , ) the only common element is my scripts( for image delivery ) and my css and nav buttons....
it knows about the other pages cos I checked it today and it shows all of them in its "more pages" area ...but whatever keyword or words I search ..it still bases all it's results on just the first page!
...imagine trying to optimise everything via just one page for a site that offers dozens of items!
....I'm fortunate enough to have what is a very "spammy" ( google say it isn't tho..even I think it is )looking index page ..but it shouldn't be ranking higher than my others for "their own" keyword .....
What has ""g" been drinking (or smoking?)this time?..
ps this isn't sour grapes ..
I'm still at #1 for virtually each term ...but all off the one page!
pps ...recipes for trick pages for "g" for sale ...( only joking!)
12:56 pm on Mar 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 18, 2002
posts:3061
votes: 0


I had a big crawl on one of my new sites yesterday. Took a load of dynamic pages for the first time :)
1:50 pm on Mar 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Nov 11, 2003
posts:22
votes: 0


sem4u - That's great. When was that site launched and how many incoming links does it have?

I'm still waiting for a deepcrawl on a site launched 3 weeks ago... :(

2:04 pm on Mar 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 18, 2002
posts:3061
votes: 0


The site went up at the end of January so Google has taken its time to index the dynamic pages, but I am pleased that it has :)

There are a number of PR4 & PR5 links pointing to the site, but I am still waiting for the PR to update.

2:06 pm on Mar 8, 2004 (gmt 0)

New User

10+ Year Member

joined:Nov 11, 2003
posts:22
votes: 0


Thanks for the info.

I guess I'll have to be a bit more patient ;)

6:01 pm on Mar 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 4, 2000
posts:1324
votes: 0


Wish GG would get in here with some insight into this issue. But the one thing I am sure of is Google knows what they are doing and this will either clear or we will know soon enough what's going on...

-s-

6:31 pm on Mar 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 21, 2002
posts:1051
votes: 0


One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two

I posted that earlier, but have to retract it. It worked fine until last week when instead of the 200 or so pages Googlebot normally takes, it only took 8, just robots.txt and the index page. And that was despite having added 2 groups of 10 new pages each, with each group linked from the index page.

So what's up with Google? My site is hosted in the UK and I had more pages indexed last week by Baidu. :)

Incidentally does anyone know what this is? It's from a Google IP address, and also took a few pages.

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

6:46 pm on Mar 8, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 21, 2002
posts:1051
votes: 0


Please ignore the question about the Googlebot identification. Have just discovered there is a thread about it.

Dayo_UK

8:46 am on Mar 9, 2004 (gmt 0)

Inactive Member
Account Expired

 
 


Still seeing very little Googlebot activity.

Last time some of my sites were touched was Early Feb (4th,5th,6th). I know that some of the higher PR sites still are getting crawled regularly - but over a month since a deep crawl for most sites is a long time compared to recent schedules/turnaround :(

This 182 message thread spans 7 pages: 182
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members