Have had no suggestions, so have been trying to work this out by myself.
The majority of my pages were indexed by Google when the old deepcrawl/monthly update procedure was standard. But of the pages added subsequently, only the level 1 pages have been indexed (toolbar PR1) and none of the level 2 pages (greyed out).
Does Google now have a PR or level cutoff point below which it does not index?
Depend on your site. Some sites get crawled every day, others only once a month. Google indexes pretty fast lately, because of the continuous update.
Get more links to your site and raise your PR. The higher your PR the more you will get crawled. And passing on PR from a PR6 page is more easy then from a PR1 page.
Thanks Wired_Suzanne for the reply. But it doesn't solve my problem.
Google visits my site every week, and possibly more frequently, but my stats only report weekly figures. It typically takes over 30 pages. Also pages that are already indexed are re-indexed in a few days after they have been changed.
Since adding the pages I mentioned, Googlebot has taken my index page, my site map, and the level 1 page of the group, all of which have links to the remaining pages - but Googlebot doesn't ask for these.
This is not an isolated incident. The 11 pages I mentioned were just an example. I have also re-instated a group of pages which were previously indexed but removed as they might have been considered contentious during the invasion of Iraq. Only the level 1 page of this group has been indexed, and the level 2 pages again ignored.
You say "passing on PR from a PR6 page is more easy then from a PR1 page." I don't know what you mean by that, as I am not talking about pages that have a PR0, but pages that are greyed out and never indexed.
Yes, I would love to obtain more PR through links, but mine is not a commercial site and there are limited opportunities. To overcome this to some extent, my site is optimised so that incoming PR is directed away from the index page to the frontline troops. My index page is PR2, my level 1 pages (about 40) are PR2 or PR1, and almost all my indexed level 1 pages (about 150) have a PR1. (Toolbar approximations.)
What I am asking is there a change in the way Google works? Does it ignore pages that are only linked from a PR1? In which case lots of little sites like mine are in dead schtuck.
Not ignoring, more pr then more attention.
Getting a good backlink would help (pr5+)
Obviously I haven't made myself clear. Google visits me regularly and takes 30+ pages per week. I also have a PR5 back link plus a couple of PR3s. A large number of my pages have high search rankings in Google.
I need to know why certain specific pages are not taken since the change in Google working practices.
Index page pr is distributed to inner pages, so when increases affects deep ones too.
Assuming you haven't problems with your linking structure,
if your index is pr2 you can easily improve.
You are misunderstanding the question. I am looking for a specific cause why certain of my pages do not get indexed. It is a question about Google working practices.
I think Gus_R is perfectly right on this. You need higher PR to get deeper spidering. Or you need deep inbound links (preferably from high PR pages)
Deep spidering is not the issue. There are only three levels on my site. And the pages that Google is currently not taking are no deeper or different in type than the ones it already has taken using deepcrawl. So it would appear that something has changed in the way Google works since it dropped deepcrawl. Has Google introduced a cut off point for spidering based on PR? Is Google now limiting the number of pages it is taking into it's database? Or has its spidering become less efficient?
Prior to the demise of deepcrawl Google took ALL my pages every month, but now it is being very picky about what it takes, and as I mentioned in an earlier message, has not taken some reinstated pages that it had accepted earlier.
I would say No to all you questions above.
I would just give it some more time and work on your PR with more backlinks.
|I would say No to all you questions above |
Do you know that for a fact, or is that just an opinion? What I was rather hoping to get out of this was someone who could explain the change in the way Google operates.
I am not experiencing any of the above on any of my sites. One change I see is Google has gotten better at indexing all my pages and in less time. I really can't speak for others. With the one page only having a pr of 1, it just seems kind of low. I would not expect the sub pages to to be greater then a pr of one, this means pr0 or "greyed". If you raise that pr1 page up I bet Google would index the subpages faster. So until you get that pr up you will just have to wait a little longer. I really feel you should work on your pr, but I don't think that the answer you want to hear.
We have around 250K detail pages in google that are approximately 6 clicks from the homepage. Due to branding issues and redirected URLs being distributed by the user group as opposed to distributing the actual URL google indexes, we have ended up with an estimated PR0 for every dang one of those 250K, so I can respond to someones comment on higher page rank bringing in deeper crawls -- I don't think Google cares, and if they did they would be introducing bias into their operation.
The number of indexed pages we have is not decreasing and the PR0's are not dropped. More time may lead to deeper spidering, or perhaps on your homepage you could create a new section that you could use to "highlight" direct links to a few of those pages that you want to get Google attention.
Check your logs to determine Google's path through your site and you may be able to determine a problem page or pages.
HarryM could you add a few inbound links from pages outside of your domain to the inner pages that Google isn't indexing? That would be interesting to see.
how bloody annoying are you, i think the replies to ur questions are all obviously opinions, its all trial an error.
Google works in mysterious ways these days
As I said:
|Get more links to your site and raise your PR. The higher your PR the more you will get crawled. And passing on PR from a PR6 page is more easy then from a PR1 page. |
Just get more links. To your index page, to your other pages. Anywhere!
Raise your PR of the pages by getting backward links (from a high PR site) or raise your PR by a link from the indexpage with a high PR.
Is there anyone in this world with facts (other than a very few at Google)? NO
Apologies for not responding earlier, but I live in a different timezone to most.
Thenks for a couple of useful ideas.
|HarryM could you add a few inbound links from pages outside of your domain to the inner pages that Google isn't indexing? That would be interesting to see. |
Unfortunately this is a personal non-commercial site and it is extremely difficult to get incoming links. Would that I could! Also these particular pages are being hosted for an interest group and may be replaced.
|Check your logs to determine Google's path through your site and you may be able to determine a problem page or pages |
Will do at end of week. I don't get the logs normally, just stats, but can get them on demand.
Many thanks to the above. However I am less enamoured of the people who continue to say that I should throw PR at the problem. Perhaps they don't understand how insulting that sounds - sort of teaching a grandmother to suck eggs.
|However I am less enamoured of the people who continue to say that I should throw PR at the problem. Perhaps they don't understand how insulting that sounds - sort of teaching a grandmother to suck eggs. |
In fairness to those posters, they are actually correctly identifying the problem which does actually answer the question that you asked originally.
How deeply google crawls your site depends on PR.
So, logic dictates that the only way you can get those pages crawled is to link to them from the higher PR pages of your site. You say you have a PR5 and a few PR3 inbound links. If you have a decent linking structure you will have some PR4 pages in there.
Is there anyway you can link from those, rather than the PR1 page you are trying to link from?
I would guess your site map is (or at least should be) a PR4. That would do the trick.
On one of our lower PR sites, we get all the deep pages crawled by just shoving them in the site map.
I think the point is that even for a personal website, a home page of PR2 is pretty ordinary. I have a personal website with some pages PR5 and the rest PR4.
Getting links isn't THAT hard, even for a personal website. Surely your pages are about something? There are tons of directories and web pages providing links to just about everything under the sun - even directories of personal websites. Try searching for "subject of page" and "add link" or "add url" or "submit site" and associated variations. Works for me.
HarryM, i didn't mean to insult you, sorry about that.
>> So it would appear that something has changed in the way Google works since it dropped deepcrawl.
Yes, that's exactly it. The Deepbot got sacked. At the same time, the Freshbot got promoted to "Deepfreshbot". It's all in the threads and confirmed by a Google employee, eg in Msg #209 here: [webmasterworld.com...]
The name was coined in msg #43 here: [webmasterworld.com...]
>> Has Google introduced a cut off point for spidering based on PR?
The Deepbot was simply another bot than Freshbot - it crawled deep "by default". Freshbot does not crawl deep by default, but is has been promoted now, so it can crawl deep as well.
The deep pages that you got indexed by Deepbot will of course not dissapear from the index as Google values index size, but pages that were not there after Deepbot got sacked will need to be relevant to the new bot in order to get spidered.
Please read posts #13 and #31 (first two lines, page three) from this thread carefully and make your own conclusions: [webmasterworld.com...]
There's another one here (#24):
-it might be hard to interpret it out of context, but note these statements: "I noticed a few pages indexed because it looked as though they have their own links" and "my main advice is still to get a few more links"
You'll find a few more recent posts by the same author emphazising the value of inbound links if you search thoroughly. As links=PR (well, sort of) you get the advice that you need to increase PR. To do that, you need to get links.
|The Deepbot was simply another bot than Freshbot - it crawled deep "by default". Freshbot does not crawl deep by default |
That was exactly the sort of information I was looking for. I had searched WW World but hadn't come up with anything. I suspect it is my fault this thread has gone on for so long by not making my original question more specific.
It was suggested by trillianjedi
|You say you have a PR5 and a few PR3 inbound links. If you have a decent linking structure you will have some PR4 pages in there. |
I have a decent linking structure which admirably suited the situation before the demise of deepcrawl and gave me excellent page rankings. All pages were themed, no page was more than 2 levels deep, all pages were linked from sitemaps, no sitemap was over 100 links (as Googleguy suggested), and great care had been taken to optimise the PR at the deepest level - which is where it used to count.
Now that I know how things have changed I can alter my linking structure to push the PR back to the linking pages. No doubt Google will be doing something different in a few months time and I will once again be trying to compensate. :)
Well, as im not real big on pursuing everything google does... I tend to just tweak my site all the time and add things for my users, and I know my traffic grows. My main page is PR5.
I just checked google for my website. The last time I checked I had a #3 link when people searched for 'jobs for web designers'. Now I can't find my site anywhere... except if I search directly on the domain name, then mostly my sitemap shows up. Sigh. I never was all that good at the keyword/description/seo thing, but was doing ok.
Im kind of bummed. Guess I'll get over it. Maybe its time to hire someone to do my seo for me. I have over 425 pages, so I've tended to use SSI for the same header and footer on most of them (since I tweak so much, I dont want to change 400 pages every time!).
Maybe this is all a result of the loss of the G deepbot, though someone said you dont lose pages because G likes the large index. I'm open to suggestions.
I know how you feel. My site was going along fine with a lot of traffic until Deepbot was replaced. Now my home page has reduced from PR3 to PR2 and I have the problem that new pages are not getting indexed.
With Deepbot I could guarantee that all my pages would be taken every month, now its only 30 pages a week.
But I think I have a solution for the new pages that are not getting indexed because they are linked from a page with low PR. I am temporarily linking them from the home page to see if that helps.
I have a pr 4 index linking to internal pages and none of these are being crawled now... I think getting links from external sites for each page in my site is impossible...
So what's the trick now?
I get index and robots txt indexed daily but my new pages are completely ignored... even those with pr1...
Is there any other solution than getting incoming links to every page in my site?
I don't think you need external links to every single page, but I found getting some deep links has encouraged the bot to spider pretty much all of my internal pages regularly. If you create an index on a page that has a higher PR, the bot should pick up the new pages within a reasonable time after adding them.
What I find kinda interesting is Harry is complaining he cannot get links because this is a personal site not commercial. Others have said they cannot get links because their sites are commercial not personal.
|What I find kinda interesting is Harry is complaining he cannot get links because this is a personal site not commercial. |
No, I am not complaining, just stating a perception. Yes, I can get links from sites that approach me to promote their products, but none are relevant - and could be dodgy. Most of the bigger sites don't link to personal sites, and frequently sites that will have low PR.
I have no problem with low PR per se. My pages get good results from search engines, frequently in the number one spot, and I get a healthy traffic. PR is not as useful as a good keyword related page.
The problem I have now is that Googlebot is ignoring my new pages and I need more PR to correct that situation. In the last few days I have updated every page in the site, but when these changes will get into the database - if ever - is completely unknown.
From an initial look at my logs (not yet completed) it seems as if the new Googlebot works completely haphazardly. It takes pages apparently at random. They are at different levels, in different themed areas, and frequently not linked to each other. On one day it took the same page twice.
Ah, for the good old days when Deepcrawl prowled the web. :)
|If you create an index on a page that has a higher PR, the bot should pick up the new pages within a reasonable time after adding them. |
I have a commercial site PR 4 in index and PR 1 in some internal pages. The bot is not crawling other things except robots.txt and index and sometimes some old files that are not anymore in server...
What is a reasonable time? I'm talking about that behaviour in 3 weeks
|I think getting links from external sites for each page in my site is impossible... |
I don't know if this would help, but I have a (free) private Link Exchange on my site (the page has a PR4). It would it least be another external link to your site.
|Im kind of bummed. Guess I'll get over it. |
Im over it! I had some great revelations about my pages after posting that... making some major changes....
"I don't know if this would help, but I have a (free) private Link Exchange on my site (the page has a PR4). It would it least be another external link to your site."
I would be interested in this. Could you send me a PM on where I can find your site?
"Im over it! I had some great revelations about my pages after posting that... making some major changes.... "
What kind of changes did you do? Did you increase your page rank by getting more links from relevant sites?