| 8:00 am on Feb 7, 2008 (gmt 0)|
what do you mean by [ this is how far it goes:
HEAD / HTTP/1.1" 200 254
To explain why I ask:
HEAD / HTTP/1.1
User-Agent: Loony with Putty or similar telnet style program using HTTP specific settings.
is enough 'request' to get the header details from a server using that protocol, if you exchange 'HEAD' for 'GET' then you are asking for the document proper. Coincidentally '200' is the server response to say that it can deliver that page.
When you type 'http://www.example.com/this_url/thanks.html' into your browser's address bar it 'looks up' the IP address of www.example.com, connects to the HTTP service at that IP address and sends a request starting with
GET /this_url/thanks.html HTTP/1.1
User-Agent: <whatever you are using>
So I really can't figure out where you are reading that string from, unless it's in your httpd logs in which case: If you are saying that the log entries that have Googlebot as the 'cs-user-agent' only ever 'HEAD' documents off your server then yup, till Googlebot uses 'GET' on your documents you are pretty surely out of SERPs.
| 1:23 pm on Feb 7, 2008 (gmt 0)|
What I mean is: The Google robot won't go farther than that.
All it asks for is robots.txt and header
What 254 represents? ( HEAD / HTTP/1.1" 200 254 )
Thank you much
| 1:50 pm on Feb 7, 2008 (gmt 0)|
welcome to WebmasterWorld [webmasterworld.com], visa666!
the 254 should be the content length in bytes of the response that would have occurred if it was a HTTP GET request rather than a HTTP HEAD request.
| 7:08 pm on Feb 7, 2008 (gmt 0)|
Thank you phranque
| 3:14 am on Feb 8, 2008 (gmt 0)|
if your robots.txt doesn't read as below then could you please post it?
just worried you've banned yourself with a 'poor' directive in your robots.txt file is all.
| 12:26 pm on Feb 8, 2008 (gmt 0)|
I took it out few times, modified it few times....
| 12:55 pm on Feb 8, 2008 (gmt 0)|
fair enough, it's a fine and fair robots.txt file so it can only be that Google has banned it, I've not heard of Googlebot only HEADing documents off servers, but then none of my mates have been blacklisted (or have mentioned it to me anyway.)
I would find an appropriate contact form buried away there in Google somewhere and ask them what I could do if I had your situation.
| 5:00 pm on Feb 8, 2008 (gmt 0)|
Since March 2007 I bugged them almost every month with reinclusion or a clue about the BAN. Still in dark….
Thank you for your suggestions robsoles
| 12:30 am on Feb 9, 2008 (gmt 0)|
now I'm fascinated - I haven't read the rules of this place well enough to be positive but I think you must be allowed to sticky-mail me the domain name so I can throw a few tools at it to see if I can reveal why Googlebot hates it.
I'll tell you if I see anything I know breaks Google-TOS or anything else that makes it look bad.
| 4:18 am on Feb 9, 2008 (gmt 0)|
Are those tools something that could be of use for anyone, regardless of site’s index status?
| 5:05 am on Feb 9, 2008 (gmt 0)|
Yes, all of the tools can be found at some level of implementation riddled all over the internet - many of these do such and such a level of detail to get you in and buy the product behind, there are a few gems out there like w3 that simply host tools of great worth without seeming to ask more than that you use them to make a decent pig of your site.
I'm writing a crawler of my own because it seems that to have everything I feel one program should be able to detail for you, you have to buy two programs currently available on the market and use a few of the free-hosted-tools. Terribly useful already but I am not ready to release it, it needs to do plenty more in my opinion.
Among other great search terms for finding free hosted tools this is probably the best one;
| 4:48 pm on Feb 9, 2008 (gmt 0)|
Google boot yesterday got 1 page and one more today! This is the first time in month when it passes the robots.txt
Probably they just heard I was talking to you.
I'll pm you the domain.
| 1:25 am on Feb 10, 2008 (gmt 0)|
I threw a few tools at the domain, yahoo has info about your site and Google does not, actually invariably a bad sign. The 'GET's that Googlebot is performing on pages on your site may indicate the end of the problem of not even being in Google's index but it may just be confirming whatever it didn't like enough to switch to just 'HEAD'ing documents off your site in the first place.
A general domain checking service detailed a few areas of concern I'd fix if I was running your show. One or two statistical sites couldn't cope with requests about your domain.
I found an email address in the contact page of the site and will send my crawler's output with some explanation in the email and links to whatever else I find of interest about your site to that address.
| 1:53 am on Feb 10, 2008 (gmt 0)|
actually visa666, it's occured to me that maybe that's not your email address on that contact page and it could even be rude of me to send my email there, I'll reply your SM with request for your preferred email address.
| 2:44 am on Feb 10, 2008 (gmt 0)|
Sorry, didn’t mean to …Typing mistake.
You can use that e-mail but I PM you a different one
Thank you again rObsoles
| 3:01 am on Feb 10, 2008 (gmt 0)|
Don't stress, although...
The details my crawler pulled out of the host worry me a bit but while I was composing the email I am sending, a simple enough Google search occurred to me and
in Google search box, brings the site up with a blacklist context, it's a clickable thing in the email.
You are welcome, and Good Luck!