Forum Moderators: DixonJones
what do you mean by [ this is how far it goes:
HEAD / HTTP/1.1" 200 254
]?
To explain why I ask:
{
HEAD / HTTP/1.1
Host: www.example.com
User-Agent: Loony with Putty or similar telnet style program using HTTP specific settings.
Accept: text/xml,text/html,text/plain
Accept-Charset: ISO-8859-1,utf-8
Connection: close
}
is enough 'request' to get the header details from a server using that protocol, if you exchange 'HEAD' for 'GET' then you are asking for the document proper. Coincidentally '200' is the server response to say that it can deliver that page.
When you type 'http://www.example.com/this_url/thanks.html' into your browser's address bar it 'looks up' the IP address of www.example.com, connects to the HTTP service at that IP address and sends a request starting with
{
GET /this_url/thanks.html HTTP/1.1
Host: www.example.com
User-Agent: <whatever you are using>
etc
etc
}
So I really can't figure out where you are reading that string from, unless it's in your httpd logs in which case: If you are saying that the log entries that have Googlebot as the 'cs-user-agent' only ever 'HEAD' documents off your server then yup, till Googlebot uses 'GET' on your documents you are pretty surely out of SERPs.
Regards,
robsoles.
the 254 should be the content length in bytes of the response that would have occurred if it was a HTTP GET request rather than a HTTP HEAD request.
I would find an appropriate contact form buried away there in Google somewhere and ask them what I could do if I had your situation.
Regards,
robsoles.
now I'm fascinated - I haven't read the rules of this place well enough to be positive but I think you must be allowed to sticky-mail me the domain name so I can throw a few tools at it to see if I can reveal why Googlebot hates it.
I'll tell you if I see anything I know breaks Google-TOS or anything else that makes it look bad.
Regards,
robsoles.
Yes, all of the tools can be found at some level of implementation riddled all over the internet - many of these do such and such a level of detail to get you in and buy the product behind, there are a few gems out there like w3 that simply host tools of great worth without seeming to ask more than that you use them to make a decent pig of your site.
I'm writing a crawler of my own because it seems that to have everything I feel one program should be able to detail for you, you have to buy two programs currently available on the market and use a few of the free-hosted-tools. Terribly useful already but I am not ready to release it, it needs to do plenty more in my opinion.
Among other great search terms for finding free hosted tools this is probably the best one;
seo tools
Regards,
robsoles.
I threw a few tools at the domain, yahoo has info about your site and Google does not, actually invariably a bad sign. The 'GET's that Googlebot is performing on pages on your site may indicate the end of the problem of not even being in Google's index but it may just be confirming whatever it didn't like enough to switch to just 'HEAD'ing documents off your site in the first place.
in Google:
site:<insert-your-domain>
in Yahoo:
linkdomain:<insert-your-domain> -site:<insert-your-domain>
A general domain checking service detailed a few areas of concern I'd fix if I was running your show. One or two statistical sites couldn't cope with requests about your domain.
I found an email address in the contact page of the site and will send my crawler's output with some explanation in the email and links to whatever else I find of interest about your site to that address.
Regards,
robsoles.
Don't stress, although...
The details my crawler pulled out of the host worry me a bit but while I was composing the email I am sending, a simple enough Google search occurred to me and
blacklist "<insert-domain-here>"
in Google search box, brings the site up with a blacklist context, it's a clickable thing in the email.
You are welcome, and Good Luck!
robsoles.