Forum Moderators: open
The MediaBot crawls deeper into the site without issue. The site runs AdSense.
Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.
Any ideas?
So, umm yeah I agree.
Just my 2 cents tho.
-phish
I have access to numerous sites that demonstrate the opposite of what you and metagod are saying. So whatever you're seeing isn't universal.
What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again. It repeats this behaviour a few times a day on one particular site.
I see from the thread that other people have also noticed this, which leads me to believe it isn't a technical problem with the site itself. The fact that Googlebot crawled the site (eventually) pretty much confirms it.
Obviously the more links you have to deep content, the better. That fact is obvious, but it is not the topic of this thread.
[edited by: feeder at 2:58 am (utc) on Mar. 4, 2004]
That being said, I've noticed that the pages added over the last week haven't been hit yet although they're all linked from several pages that are visited daily by the bot.
The only point of my post is to say, "Don't Panic". There has been a bit of a lull recently but this will change.
-phish
We'll see what happens. I bet googlebot will visit the day after the link goes live.
Hey, who cares about Google! I've just made Preferred Member. So who's going to offer me a beer?
Hey me too!
Example: one site I watch has pr7, gets crawled deep daily, (100 pages or so a day) then it's dropped for its main keyword to like #500. During this time that it's dropped, GBot comes, grabs robots and index, then jets till the next day. This happens for about 4 weeks, same thing every day. All of a sudden algo tweak puts the site back at #3 where the site has been for 3 years, what happens next?
Thanks. That's what I'm interested in :)
My site has very strong multiple inbound linking (PR7 and 6) at different levels.
There's been a change in crawl activity of late, so I can pretty much discount on-site technical problems.
What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again
I feel sure that Google has some sort of trigger that controls whether a site is deep crawled or not. This is related to PR. For a new site the trigger may be set because Googlebot discovered the site by following a high value link, although the visible PR in the toolbar may still be PR0.
Without this trigger Googlebot will crawl deeper only if it finds the index page has changed, or it has followed a new deep link. But it stops after a few pages if it finds nothing else is new. Also if nothing ever changes, Googlebot stops crawling and pages start losing there snippets and may eventually disappear from the index.
If the trigger is set, and stays set due to high PR/high value inbound links, then even the most obsolete site will get crawled regularly and stay in the index.
Adding new pages to a site does nothing unless Google stumbles over them. One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two.
Another way to encourage Googlebot is to freshen the whole site by making a small change on every page - easy with PHP.
I don't know whether this is a true rendition of what really happens, but it's the model that works for me.
The homepage would appear and disappear in google.
I moved the site to a different server thinking it could be a server configuration problem.
Googlebot continued to hit the homepage and exit.
After another month of this, I have now dumped a test site on the domain that I know has been indexed in the past to see if it is something internally wrong with the site.
The site has some good PR links pointing to it, that it the past should have given it a PR6.
In the past the same type of site with similar links would be fully indexed with a week.
This is not the only site I know of that is experiencing the same.
M i n n a p p l e
Added: Sorry, welcome to WW bnmwebmaster. Changes to pages are good but not as important, perhaps, as adding brand new pages to the site, with lots of pertinent content, instead. Tweaking an index page often won't substitute for the regular adding of more content, on more pages, that link back to the main page/s.
Definitely if you are not getting the rest of the site crawled! You have to encourage the bot to go further, and a way to do that is change the page. Multiple times a day is an exaggeration, but with the bot coming multiple times a day it sure makes sense to change it every day. Right now it comes and sees nothing has changed so it goes away. Perfectly reasonable reaction. If it sees changes it will dig deeper sooner.
Don't bore Googlebot with the same old reruns. It'll be happy with new programs to check out each visit.
It seems like we're having good success getting the home page crawled and indexed. Perhaps google is doing some maintenance/tweaks to the crawler and they have limited resouces at the moment. The crawl is probably priority based for the moment.
For those who are having problems is it freshbot that keeps visiting everyday (or is freshbot so 2002?)
I have over 100+ websites, most of them with PR5. These sites are updated regularly and new pages added every now and then.
This time, since Google began its new crawl with those dates appearing beneath the URL's; none of my sites have been crawled as such. Althought the stats shows Googlebot but the cache of the homepages is still the same.
I have also added 1000's of backward links to some of these sites and they show no sign on improvement since googlebot is not deep crawling those sites too ( I think),
By all means, Googlebot is simple ignoring many of those sites, including ours. Dont seem to be server error at our end but at Google's end ofcourse.
We have done all the thinking and still blank on this. www2 and www3 shows new search results but with most of the same sites as www i.e google has not visited the fresh sites.
Some of the SERPS are illogical with no decent sites with any proper optimization or backward links. Google is sleeping or maybe we are in for a major update, my friend.
Lets keep our fingers crossed.
All the Best
One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two
I posted that earlier, but have to retract it. It worked fine until last week when instead of the 200 or so pages Googlebot normally takes, it only took 8, just robots.txt and the index page. And that was despite having added 2 groups of 10 new pages each, with each group linked from the index page.
So what's up with Google? My site is hosted in the UK and I had more pages indexed last week by Baidu. :)
Incidentally does anyone know what this is? It's from a Google IP address, and also took a few pages.
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Last time some of my sites were touched was Early Feb (4th,5th,6th). I know that some of the higher PR sites still are getting crawled regularly - but over a month since a deep crawl for most sites is a long time compared to recent schedules/turnaround :(