Forum Moderators: open
The MediaBot crawls deeper into the site without issue. The site runs AdSense.
Could there be anything in the server config that is causing this? It isn't robots.txt. The index page is lo-fi and xenu crawls it fine, as does the searchengineworld sim spider.
Any ideas?
How to differenciate "freshbot" and "deepbot"?
Before, if IP is 64.x.x.x then it is freshbot;
if IP is 216.x.x.x then it is deepbot.
Are these still valid?
What are the behaviour of "freshbot" and "deepbot"?
Will freshbot only get two or one level of pages in the sites while deepbot crawl almost all the pages in the site?
I launched a site around 6th Feb (1280 static pages) then sat down to keep an eye on my logs for visits from googlebot. Time and time again it looked for a robots.txt (404ing as there isnt one) then would pick up index.htm, index it, and go no further into the site. This has happened nearly every day since 6 Feb. These visits by gbot are thanks to a small number of inbounds I set up to the site, from a few random locations on other sites I have.
Finally, today, after weeks of waiting, the bot had a deep crawl to not just home page, but levels 1 & 2 below, and had a really good look around. I've not made any changes to this site for a while, as I thought I'd hang fire to see if the thing ever did get crawled.
I'm mighty relieved to see it calling by, after all its over 1 calendar month now since launch. Quite when the results of its crawl will make it into the live index is anyones guess, but really I just wanted to reassure anyone who is worried that, assuming you have a few inbounds links set up, and easily navigable pages in place on your new site, you *should* see a deep crawl, just not straightaway.
Now it just needs to get my more recent sites on its radar! Time for a beer
DoU
I just do not understand what Google is doing.
Well, my 2 cents worth is this, Google deep crawls my site every day or so - so 'I'm alright Jack.'
But I still think their serps in certain areas are rubbish. And I think their crawlers are experiencing technical problems. Why crawl an index page, and not follow / index the links? It's just incompetence isn't it
The index page is updated all the time, new pages are added all the time to the site, and nothing is being crawled. This has been going on for a month now.
If gbot isnt following links, or indexing new pages, why use google anymore.
I can search through archive.org if i wanted to find old information with a lack of new content.
Maybe it's a problem, but it's more likely to be a feature. What about all those dynamically generated SEO pages full of spammy words? What if lots of those sites had a completely different link structure every time you would reload the site?
Maybe that's what the bot is supposed to do, see if a site really has a genuine link structure (I know of no legit sites in which ALL URLs change daily) by requesting the index file frequently during the first weeks. It could check links to other pages, and if more than 50% of the links have changed (just an example), it will flag the site as "SEO/spam" or something and not deepspider it. If it finds the site legit, then it will deepspider it after all.
Besides the fact I have no real proof for this theory, I'm pretty sure the people at Google are becoming more selective on what to index and what not to. They must be able to handle the expansion of the web, and at the current rate, it's going pretty fast. What if they run out of storage or processing power and are unable to keep upgrading with the fast pace of the webs growth? Even if my theory proves wrong, I know they are aware of this and will be taking action soon.
Obviously I'm very new to all this but today I think google has deep crawled in a way. It went to every link on my homepage but no deeper. I just hope that at some point it actually looks at my sitemap, something I didn't have on my site until 4 weeks ago (I never knew how google worked and relied on a java menu to nav).
Prehaps one day my whole site will be spidered.
i think your theory is plausible...but the only one i have changed since 1 month ist the position of the link to the other pages in the index.php, that CANT be a problem(?). next pages are permanently the same. it will be generated from a mysql database....always the same. my link- strukture ends after 3 clicks in a static html-file. G have spidered all these html-pages at march 7...i dont know why G show now only a cache version from january...my sitestructur is likely millions of other pages.
Yes, that's the right word. "reluctantly".
Googlebot is reluctantly visiting my sites. Googlebot has become lethargic. Googlebot is, well, dragging it's butt!
What's the problem? Have we insulted the bot? Has someone abused the critter? Passed out too many tricks and not enough treats? Has Googlebot not been awarded the "crawler of the year" award at the Plex?
Lets all give Googlebot a morale boost. Tell Googlebot how he/she's been kicking Slurp's butt all over the web. Let's have it for our favorite bot! YEAH GOOGLEBOT! GO GOOGLEBOT, GO!
Just in case it was my own problem, I took a look at the links to my new pages. As the pages were in different encodings, I had used gifs so that the links would display correctly to all users. But I found I had inadvertently left the alt tags empty. I added some alt text, and bingo! Googlebot came along and indexed the lot. Perhaps Googlebot ignores image links which do not have an alt or title text.
However this could be just a coincidence. IMHO there has been a lull in normal Googlebot activity, possibly connected to the test spiders Google has been running, and this now seemes to be over.
googlebot comes in once or twice a day.
it hits robot.txt then index.html.
on the index.html page, however, it gets stuck in a "noscript" tag that redircts the robot to a "you need javascript" landing page.
I can't get rid of the javascript validation because my checkout procedure and shopping cart rely on it, but i'd like to have google crawl my site!
what should i do here?
What to do:
A lot of e-commerce software uses Javascript, and Google doesn't always follow these links (I know to my cost - my site does o.k., but many of the 'product detail' pages are only linked to by Javascript, and Google doesn't even know about them.) I could easily outrank my competitors if these detailed pages were included in the G index.
So, what to do is this:
Change your e-comm software - perhaps something to consider in the medium to long term.
But what you can do right now is 'hand edit' your index page after uploading, and make all those JS links nice simple HTML links that point to your product sections / individual product pages.
It's a bit of a pain to do, but if you save these links in a .txt file you can always cut and paste them in future to speed things up.
Best of luck ;)
Google bot is not crawling my index page which has very good page rank.No fresh bot indication since 3 weeks.GoogleBot HAS been crawling my home page and more everyday but for the last two weeks I have not had a fresh tag although I always had one before. This started when the new crawler started coming around.