I personally have to agrree with MetaGod. From my experience since Austin here's what's going on with me. Googlebot comes, grabs 20-30 pages everyday (pages that were up pre austin) Since Austin I have added lets say another 20-30 pages, all linked internally, in one way, shape, or form, but these new pages dont get crawled. I suspected the same thing as MetaGod, so I got some external links to some of these "deeper" internal pages, and bingo.
So, umm yeah I agree.
Just my 2 cents tho.
With respect, we are off-topic.
I have access to numerous sites that demonstrate the opposite of what you and metagod are saying. So whatever you're seeing isn't universal.
What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again. It repeats this behaviour a few times a day on one particular site.
I see from the thread that other people have also noticed this, which leads me to believe it isn't a technical problem with the site itself. The fact that Googlebot crawled the site (eventually) pretty much confirms it.
Obviously the more links you have to deep content, the better. That fact is obvious, but it is not the topic of this thread.
[edited by: feeder at 2:58 am (utc) on Mar. 4, 2004]
To add another perspective: Our site is getting crawled well daily and showing frequently updated freshtags on many of the PR5 pages.
That being said, I've noticed that the pages added over the last week haven't been hit yet although they're all linked from several pages that are visited daily by the bot.
The only point of my post is to say, "Don't Panic". There has been a bit of a lull recently but this will change.
Okie, I was just agreeing what he was saying is all.
So back to the "topic"...I also have about 30 or so sets of logs at my fingertips, and the problem your referencing, at least on the sites I deal with in the commercial market, I have attributed to whatever these new filters or algo changes are, that are dropping sites for certain keywords. Example: one site I watch has pr7, gets crawled deep daily, (100 pages or so a day) then it's dropped for its main keyword to like #500. During this time that it's dropped, GBot comes, grabs robots and index, then jets till the next day. This happens for about 4 weeks, same thing every day. All of a sudden algo tweak puts the site back at #3 where the site has been for 3 years, what happens next? You guessed it Gbot goes back to crawling all 100 pages a day everyday. So anyway I can only speak for myself ,but hey, this is what I'm seeing in black n white. Why? I dunno.
I just uploaded a brand new site and have requested two links. One is from DMOZ in a Cat that seems to have an editor.
We'll see what happens. I bet googlebot will visit the day after the link goes live.
|Hey, who cares about Google! I've just made Preferred Member. So who's going to offer me a beer? |
Hey me too!
|Example: one site I watch has pr7, gets crawled deep daily, (100 pages or so a day) then it's dropped for its main keyword to like #500. During this time that it's dropped, GBot comes, grabs robots and index, then jets till the next day. This happens for about 4 weeks, same thing every day. All of a sudden algo tweak puts the site back at #3 where the site has been for 3 years, what happens next? |
Thanks. That's what I'm interested in :)
My site has very strong multiple inbound linking (PR7 and 6) at different levels.
There's been a change in crawl activity of late, so I can pretty much discount on-site technical problems.
New preferred members BUY the beer.
Glad to have cleared that up for you guys.
|What I'm talking about is a Googlebot that grabs the index and robots.txt, then goes away again |
I feel sure that Google has some sort of trigger that controls whether a site is deep crawled or not. This is related to PR. For a new site the trigger may be set because Googlebot discovered the site by following a high value link, although the visible PR in the toolbar may still be PR0.
Without this trigger Googlebot will crawl deeper only if it finds the index page has changed, or it has followed a new deep link. But it stops after a few pages if it finds nothing else is new. Also if nothing ever changes, Googlebot stops crawling and pages start losing there snippets and may eventually disappear from the index.
If the trigger is set, and stays set due to high PR/high value inbound links, then even the most obsolete site will get crawled regularly and stay in the index.
Adding new pages to a site does nothing unless Google stumbles over them. One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two.
Another way to encourage Googlebot is to freshen the whole site by making a small change on every page - easy with PHP.
I don't know whether this is a true rendition of what really happens, but it's the model that works for me.
I am getting the same thing on my site. I submitted to Google about 6-8 weeks ago. It does the same thing everytime it comes to my site; goes to robots.txt then to the home pages and then leaves. It does this process, sometimes multiple times per day.
How many times a day do you change your homepage?
I am experiencing the same thing on a newer site.
The site sat for a month, and googlebot would hit the homepage and exit.
The homepage would appear and disappear in google.
I moved the site to a different server thinking it could be a server configuration problem.
Googlebot continued to hit the homepage and exit.
After another month of this, I have now dumped a test site on the domain that I know has been indexed in the past to see if it is something internally wrong with the site.
The site has some good PR links pointing to it, that it the past should have given it a PR6.
In the past the same type of site with similar links would be fully indexed with a week.
This is not the only site I know of that is experiencing the same.
M i n n a p p l e
Can't say that I change my home page every day, much less multiple times per day. Should I be?
Added: Sorry, welcome to WW bnmwebmaster. Changes to pages are good but not as important, perhaps, as adding brand new pages to the site, with lots of pertinent content, instead. Tweaking an index page often won't substitute for the regular adding of more content, on more pages, that link back to the main page/s.
Confirmed. Googlebot is exhibiting pre-FL traits with regard to indexing newer sites beyond the home page. How quickly we forget how it used to be...
Must also confirm a surprising mackdaddy, CNN-fresh, status on established sites. Not crawling?!
"Should I be?"
Definitely if you are not getting the rest of the site crawled! You have to encourage the bot to go further, and a way to do that is change the page. Multiple times a day is an exaggeration, but with the bot coming multiple times a day it sure makes sense to change it every day. Right now it comes and sees nothing has changed so it goes away. Perfectly reasonable reaction. If it sees changes it will dig deeper sooner.
Don't bore Googlebot with the same old reruns. It'll be happy with new programs to check out each visit.
I've had much better crawls since I uploaded a page with a great big jpeg of a scantily clad Googlebot. Googlebot now visits every day :)
We all focus a lot of energy and analysis on the search results. It looks like there has been some sort of a tweak to the crawling algo.
It seems like we're having good success getting the home page crawled and indexed. Perhaps google is doing some maintenance/tweaks to the crawler and they have limited resouces at the moment. The crawl is probably priority based for the moment.
For those who are having problems is it freshbot that keeps visiting everyday (or is freshbot so 2002?)
It's unfashionable to state this - particularly on this forum - but Google isn't immune from technical problems. Heck - we have just been through 4 months of them :)
I have a question about Googlebot. Today I checked the logs and finally saw "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
the Ip is 22.214.171.124
My question is this. They only hit the main page but nothing else. I have a PR3 but most of my links that I have acquired are not registering as yet. What does this mean? Will they come back? My other point is that before they made all their changes I was from #1-#10 on any search words remotely related to my site. At that time I had no document tag, no content tag, no meta tags, and the site had not been optomized because I didn't know about any of this. Now I have corrected all errors and every page validates with W3C Validator and I am not found. Can anyone explain this? I just don't get it. I have gotten a lot of links from sites that now have me on their site as a link (vacation type sites) and they are all related to my industry but when I did a link check I find that one of the links showing is something I did not request a link for and it is only somewhat related in that it more or less is a search site for every type industry. I have read this board until I am almost blind (LOL) and I still don't get it. Sorry!
I have over 100+ websites, most of them with PR5. These sites are updated regularly and new pages added every now and then.
This time, since Google began its new crawl with those dates appearing beneath the URL's; none of my sites have been crawled as such. Althought the stats shows Googlebot but the cache of the homepages is still the same.
I have also added 1000's of backward links to some of these sites and they show no sign on improvement since googlebot is not deep crawling those sites too ( I think),
By all means, Googlebot is simple ignoring many of those sites, including ours. Dont seem to be server error at our end but at Google's end ofcourse.
We have done all the thinking and still blank on this. www2 and www3 shows new search results but with most of the same sites as www i.e google has not visited the fresh sites.
Some of the SERPS are illogical with no decent sites with any proper optimization or backward links. Google is sleeping or maybe we are in for a major update, my friend.
Lets keep our fingers crossed.
All the Best
I have the following lines in my log file and the same problem which was discussed under this topic
126.96.36.199 - - [08/Mar/2004:10:39:19 +0000] "GET / HTTP/1.0" 200 31393 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html
Google have indexed only the main page of my website. and it is about 2 weeks it comes and grabs only the index page and robots.txt. Nothing more. This website is new and doesn't have a page rank at all. But as far as I know page rank is not considered at this point. All my pages are dynamical and most of them consists of one variable (ex. index.php?ln=en). There should be no problem. Now Let's take a look back at these lines up there. After (200) there are 5 numbers. What do they mean? After each visit of googlebot they change. Is it some kind of googlebot identificator or smth? Do you have any advices how to act in this situation. Should I wait for a few month for googlebot to index all of my pages?! Please take a look beadsky.com maybe you will know what to do :(
I've got "gbot 2" coming by twice a day since 10 days now ..eating up my robots text file and every page I've got each time it comes ....love my logs!...still doesn't do any good at all for indexing tho
..its only showing "index.html" page as the "result page" for any search on any keyword or phrase of mine that I do.
..last visit yesterday evening
..all pages spidered ( all content is different and 50% of pages are in another language ..all pages have different titles , text, alts , metas , pics , ) the only common element is my scripts( for image delivery ) and my css and nav buttons....
it knows about the other pages cos I checked it today and it shows all of them in its "more pages" area ...but whatever keyword or words I search ..it still bases all it's results on just the first page!
...imagine trying to optimise everything via just one page for a site that offers dozens of items!
....I'm fortunate enough to have what is a very "spammy" ( google say it isn't tho..even I think it is )looking index page ..but it shouldn't be ranking higher than my others for "their own" keyword .....
What has ""g" been drinking (or smoking?)this time?..
ps this isn't sour grapes ..
I'm still at #1 for virtually each term ...but all off the one page!
pps ...recipes for trick pages for "g" for sale ...( only joking!)
I had a big crawl on one of my new sites yesterday. Took a load of dynamic pages for the first time :)
sem4u - That's great. When was that site launched and how many incoming links does it have?
I'm still waiting for a deepcrawl on a site launched 3 weeks ago... :(
The site went up at the end of January so Google has taken its time to index the dynamic pages, but I am pleased that it has :)
There are a number of PR4 & PR5 links pointing to the site, but I am still waiting for the PR to update.
Thanks for the info.
I guess I'll have to be a bit more patient ;)
Wish GG would get in here with some insight into this issue. But the one thing I am sure of is Google knows what they are doing and this will either clear or we will know soon enough what's going on...
|One way around this is to temporarily link new pages directly from the index page. Google then sees the index page has changed and will almost certainly follow the new links within a day or two |
I posted that earlier, but have to retract it. It worked fine until last week when instead of the 200 or so pages Googlebot normally takes, it only took 8, just robots.txt and the index page. And that was despite having added 2 groups of 10 new pages each, with each group linked from the index page.
So what's up with Google? My site is hosted in the UK and I had more pages indexed last week by Baidu. :)
Incidentally does anyone know what this is? It's from a Google IP address, and also took a few pages.
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Please ignore the question about the Googlebot identification. Have just discovered there is a thread about it.
Still seeing very little Googlebot activity.
Last time some of my sites were touched was Early Feb (4th,5th,6th). I know that some of the higher PR sites still are getting crawled regularly - but over a month since a deep crawl for most sites is a long time compared to recent schedules/turnaround :(
How to differenciate "freshbot" and "deepbot"?
Before, if IP is 64.x.x.x then it is freshbot;
if IP is 216.x.x.x then it is deepbot.
Are these still valid?
What are the behaviour of "freshbot" and "deepbot"?
Will freshbot only get two or one level of pages in the sites while deepbot crawl almost all the pages in the site?
| This 182 message thread spans 7 pages: < < 182 ( 1  3 4 5 6 7 ) > > |