Forum Moderators: open
The MediaBot crawls deeper into the site without issue. The site runs AdSense.
Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.
Any ideas?
As tantalus said, the other problem could be if your index page isn't modified. My solution is to add a link from the index page to any new pages. The next time Google visits it sees the index page has been modified and usually within a few hours has come back to take the new pages. I remove the links when the toolbar PR for the new pages show white.
I usually get about 70% of my pages indexed every week, but I don't know whether this is because of my PR or because Google perceives the site as active.
Harry
Googlebot crawls higher PR pages with a greater frequency than lower PR pages. If your new site has an index page PR of 4, and inner pages that are less than that, then the bot won't hit the inner pages very often.
On a personal note: I got back to the internet several days ago after having been in places where digital, at best, means counting on your fingers. I missed Austin entirely... it didn't seem to make a lot of difference for us, but we have serps for some minor kw combos bouncing in and out of the serps with every other search. I can either dig into the WW archives for Austin, to figure this out, or just assume that Google has become slightly schizophrenic, (not that there's anything wrong with that). Anyone who feels like giving me a quick run-down on what happened gets a free underground tour in Jamaica.
I'm seeing this also. Visits once a day picks up robots.txt and index and then leaves. This is a relatively new thing (2 or three weeksI havn't had time to plough back through my logs) and coincides with the sites in question being dropped from SERPs.
I worried for a while that this may be caused by a poison word. I have a folder called redirects with pages that do a meta refresh to an outside page. Perhaps redirects is a poison word. I disallowed this for a while in robots.txt. I've just gone and stripped this down to just one line.
User-agent: *
And I've changed the home page and that directory name to something less obvious. I then spent much of yesterday submitting pages that have a link to this domain to google submit in the vain hope that Googlebot might follow the backlinks and think "this sites worth crawling". PRs low only 3 but pages from this site were previously #1 for what I would call secondary three word terms.
Best wishes
Sid
feeder - Wjat is your sites PR?
I'm not sure how that's relevant. New sites don't show PR, but that doesn't stop them getting crawled.
The site has strong inbound linking.
Update: the site has been crawled, and pages included in the index. Googlebots behaviour hasn't changed, however. It arrives, grabs the index page, leaves. Once in a blue moon it will crawl half the site.
Odd.
I'm not sure how that's relevant.re site PR.
i have a theory that google won't deepcrawl your website unless your sub pages have incoming links from external sources
My take on this question is that these are two aspects of the same issue. PR is very relevant.
The number of pages Google crawls is probably proportional to the PR value of the index page. If the index page is PR0 because Google had not yet determined its appropriate value, then there is little chance of being crawled.
Even when the index page is more than PR0, as Google penetrates deeper the PR decreases and at a certain point Google stops crawling. The presence of deep links boosts the PR for those pages and so Google continues crawling. However once the index page becomes a reasonable value the presence of deep links is not so necessary, at least as far as crawling goes.
I suspect another factor is that if Google does not discover any changed or new pages part way through its crawl then it abandons the crawl.
All IMHO, and I disclaim any responsiblity for being wrong. :)
Harry
do you concur now?
oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...
feeder, if you guys even cared to read through my post properly you would see that I said an EXTERNAL link... without the external link your page is not worth viewing because no-one else is voting for your page...do you concur now?
No :)
I've been a full-time SEM since 2001, I know what a link is and how Google crawls. My question relates to a recent CHANGE in crawl activity. As I said in my posts, the linking structures of the site in question, both internal and external, are strong.
oh and feeder, just how rude are you? I'm WRONG? have some respect for other people...
I wasn't talking to you, I was talking to Harry :)
I was pointing out I thought he was wrong in this instance. I know this because I have access to countless clients sites logs that demonstrate otherwise. I can, and do, get sites crawled with one inbound link to an index page.