Forum Moderators: Robert Charlton & goodroi
I have noticed in my logfiles for the past few months that everytime googleBot visits my site it only makes 2 hits. This happens almost twice a month. I have a few questions if anyone has the time to provide me with some insight I would greatly appreciate it. Not sure that this matters but MSN and slurp(yahoo) index my site like crazy. Why doesn't Google like me?
Is there a reason that it only makes 2 hits on my site per visit?
Is my site not 'special' enough yet for it to index the whole site?
What will it take to get google to go past my front page?
I wonder what would cause the cached link to go away? They must have some data cached because the results page has some old content from my site.
Any guesses as to what would make the cached link disapear. Just curious.
I got that stupid 500 error to go away. You have no idea what I went through though. I will spare you the details but it seems that it was server specific and not the ZOPE platform I was using. Even for for some of my straight static sites with no dynamic anything, no Zope just stright web account folders with .html files in them, even they were returning that 500 code error. It seemed to be server specific. I did a reverse DNS lookup and put them through poodle, blocks of them would return the error, while on some of my other server machines even the Zope ones were fine.
Anyway I got that fixed then I held my breath and waited for googlebot to come along. So yesterday it came and guess what. I still only got 2 hits to the front page then nothing. ARG!
To further peeve me off I used a new domain a client had registered with me as a test. I put him in a Zope and set him up about 3 months ago. I checked his logfiles and he had not been indexed by google ever. Then about a week ago he got the 2 hits to google and it went away, same as on my site, execpt google came back a few days later and crawled his whole site, and it continues to index him daily now. This drives me nuts because this site is the same as mine for everything as far as site layout, platform, server specs goes and it was returning those stupid 500 errors just like mine was until a couple days ago when I got it fixed.
What the what? Can it be I have been blacklisted? I still can't find another example of Google removing the 'Get Googles cached version' link, like it did to mine.
I still can't find another example of Google removing the 'Get Googles cached version' link, like it did to mine.
I don't understand that sentence.
The poodle predictor looks great now.
Just need to figure out what happened now.
I see your homepage cached in google
UNDER CONSTRUCTION 28 Mar 2004
with the new date it should pick up on it now, sometimes the googlebots work in harmony with each other (googlebot and freshbot) it may have seen the new date and went to fetch the other bot.
It could still happen.
When you put in my URL directly into Google it gives the following styled results:
Google can show you the following information for this URL:
* Find web pages that are similar to www.exampleURL.com
* Find web pages that link to www.exampleURL.com
* Find web pages from the site www.exampleURL.com
* Find web pages that contain the term "www.exampleURL.com"
Normally there is also a link that says
* Show Google's cache of www.exampleURL.com
This link does not appear with my results. I haven't seen this link removed from any other URL I have tried.
However for my URL this link does not appear.
2 hits to the front about a week ago then it went away. I am not sure about how reliable this info is but someone who uses my site often claims that on the day googlebot paid me a visit last month that the Google's cache was updated for a day. He told me that he put in our URL and it was showing him the new content. He even emailed me in excitment that Google had updated the cache. So I looked at my logfiles and saw google did come and see me, even though it was just the 2 hits. So then I went to look at the Google cache and it was the same old content.
Now I am not sure that he really saw what he claims to have seen but he was adamant about it and he did email me on the day after googlebot came to see me (Which is about every 40 days so it seems like good timing on his part) to tell me about it being updated and he was excited as he has been following the non-caching of our site with intrigue.
If what he says is true then google came to my site, made 2 hits, updated the cache, then the next day reverted back to the super old cache, which stills appears without the 'Show Goolges Cache' link. Does this sound like something that could happen?
I still see 'under construction' with no cache.
I did notice on the poodle tool - the header checker - that your character set is showing up twice.
maybe the page doesn't need that META tag because that is the default for the server?
If I were you I would try 2 things.
1. submit a reinclusion request to google. but they will probably tell you the site is already listed.
what do you have an empty robots.txt file?
check it with this
[searchengineworld.com...]
if you just want to allow everything here is the code
user-agent: *
disallow:
it is really important that you have a valid robots.txt file before I tell you what your other option is. - Just making your robots.txt validate may fix the problem.
I know an empty robots.txt is supposed to be ok but this is non-standard and just giving googlebot something to go by (on a dynamic site) may be all it needs.
One more thing.
I don't know much about dynamic websites or Zobe but it makes sense to me that /index.html should not return a 404, instead I would make it a 301 redirect to /index_html/
That may be a dumb statement, like I said I don't know, just thought I would point that out.
Or make sure googlebot is asking for / and not /index.html for some reason. (it usually does the former)
I still can't find another example of Google removing the 'Get Googles cached version' link, like it did to mine.
I wouldn't be too concerned with it other than that it seems google has dropped the cache for this page.
Might be something you did in the dynamic content issue? or with the header dates?
We probable fixed the 'outdated cache' problem
Anyway I would put a bet on the robots.txt validation issue.
Your robots.txt is invalid. it is a dynamicly generated page.
here is the src of your robots.txt
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1"></HEAD>
<BODY></BODY></HTML>
empty is as if there isn't one (all robots allowed)
but I would rather see it validate.
Well another issue down.
What do you think about Googlereverting back from an updated cache to the old one that is up. Does that sound possible?
What do you think about Googlereverting back from an updated cache to the old one that is up. Does that sound possible?
Like I said googlebot is a greedy little bugger.
I think when the file date got fixed google figured out that it had an old cache and dropped it but was not able to crawl the site (robots.txt problem) so it could have gone back and grabbed the old cache again.
Once everything works it should straighten it all up in notime.
I really would like to be able to return the favor.
You already have. Im learning about this stuff and you provided an excellent troublshooting example. My site has never had a problem with googlebot.
my/sql I have looked at it but have not found a need for it on my site.
BTW I'm still seeing the same html code in your robots.txt
Well I am glad to hear that you are not doing this all for not.
So I got back from a trip out to the mountains and I peaked in on the site and what do you know, Googlebot has come to see me and he made 5 hits to the site.
I checked googles cache and low and behold I am indexed! I checked my robots.txt an it is of type text/plain so I think all is well.
Thanks again for your help Reid. Please please think of me if you ever need a few lines of script or custom SQL I would be so happy to help you in these areas.
Take care
Demaestro
i've been having the same problem as Demaestro. Googlebot visits almost daily and requests the robots.txt and the homepage. then promptly leaves. this has been going on for a couple of weeks. (patience has never been one of my virtues) i've run the validator on W3 and it returns 3 errors one of which is the "doc type". i cannot seem to add the doc type into my header. i've also run the poodle predictor and this returned some warning about "No h1, h2 or h3 Headings were found". however it returns this same warning for google.
the robots.txt allows all and the since modified is working. i also have a site map.
im stumped. any thoughts? go easy on me, im new to all this.
You don't necessarily need the doctype or h1, h2 etc. Plenty of sites get crawled without these.
User-agent: *
Disallow:
That is the entire file. No headers or anything in the robots.txt. I do not see anything in the html that obstructs them. I have some Meta keywords but thats about it.
Is this behavior usual for askjeeves, msn, become, looksmart, etc in addition to google? All of these have visited and seem to do the same thing.