homepage Welcome to WebmasterWorld Guest from 23.22.29.137
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
Forum Library, Charter, Moderators: Receptional & mademetop

Website Analytics - Tracking and Logging Forum

    
Google Ban
visa666




msg:3567911
 7:26 pm on Feb 6, 2008 (gmt 0)

I have a site probably banned from Google. I am trying to get it reindexed but this is how far it goes:

HEAD / HTTP/1.1" 200 254

Anybody knows what does it mean?

Thank you

 

robsoles




msg:3568335
 8:00 am on Feb 7, 2008 (gmt 0)

Hi visa666,

what do you mean by [ this is how far it goes:

HEAD / HTTP/1.1" 200 254

]?

To explain why I ask:

{
HEAD / HTTP/1.1
Host: www.example.com
User-Agent: Loony with Putty or similar telnet style program using HTTP specific settings.
Accept: text/xml,text/html,text/plain
Accept-Charset: ISO-8859-1,utf-8
Connection: close

}

is enough 'request' to get the header details from a server using that protocol, if you exchange 'HEAD' for 'GET' then you are asking for the document proper. Coincidentally '200' is the server response to say that it can deliver that page.

When you type 'http://www.example.com/this_url/thanks.html' into your browser's address bar it 'looks up' the IP address of www.example.com, connects to the HTTP service at that IP address and sends a request starting with

{
GET /this_url/thanks.html HTTP/1.1
Host: www.example.com
User-Agent: <whatever you are using>
etc
etc

}

So I really can't figure out where you are reading that string from, unless it's in your httpd logs in which case: If you are saying that the log entries that have Googlebot as the 'cs-user-agent' only ever 'HEAD' documents off your server then yup, till Googlebot uses 'GET' on your documents you are pretty surely out of SERPs.

Regards,
robsoles.

visa666




msg:3568492
 1:23 pm on Feb 7, 2008 (gmt 0)

Thanks robsoles

What I mean is: The Google robot won't go farther than that.
All it asks for is robots.txt and header

What 254 represents? ( HEAD / HTTP/1.1" 200 254 )

Thank you much

phranque




msg:3568512
 1:50 pm on Feb 7, 2008 (gmt 0)

welcome to WebmasterWorld [webmasterworld.com], visa666!

the 254 should be the content length in bytes of the response that would have occurred if it was a HTTP GET request rather than a HTTP HEAD request.

visa666




msg:3568867
 7:08 pm on Feb 7, 2008 (gmt 0)

Thank you phranque

robsoles




msg:3569261
 3:14 am on Feb 8, 2008 (gmt 0)

Hey visa666,

if your robots.txt doesn't read as below then could you please post it?

{
User-Agent: *
Disallow:

}

just worried you've banned yourself with a 'poor' directive in your robots.txt file is all.

Regards,
robsoles.

visa666




msg:3569584
 12:26 pm on Feb 8, 2008 (gmt 0)


I took it out few times, modified it few times....
Didn't work!

User-agent: *
Disallow: /cgi-bin/

robsoles




msg:3569606
 12:55 pm on Feb 8, 2008 (gmt 0)

fair enough, it's a fine and fair robots.txt file so it can only be that Google has banned it, I've not heard of Googlebot only HEADing documents off servers, but then none of my mates have been blacklisted (or have mentioned it to me anyway.)

I would find an appropriate contact form buried away there in Google somewhere and ask them what I could do if I had your situation.

Regards,
robsoles.

visa666




msg:3569782
 5:00 pm on Feb 8, 2008 (gmt 0)

Since March 2007 I bugged them almost every month with reinclusion or a clue about the BAN. Still in dark….

Thank you for your suggestions robsoles

robsoles




msg:3570158
 12:30 am on Feb 9, 2008 (gmt 0)

Hey visa555,

now I'm fascinated - I haven't read the rules of this place well enough to be positive but I think you must be allowed to sticky-mail me the domain name so I can throw a few tools at it to see if I can reveal why Googlebot hates it.

I'll tell you if I see anything I know breaks Google-TOS or anything else that makes it look bad.

Regards,
robsoles.

smallcompany




msg:3570245
 4:18 am on Feb 9, 2008 (gmt 0)

robsoles:

Are those tools something that could be of use for anyone, regardless of site’s index status?

Thanks

robsoles




msg:3570261
 5:05 am on Feb 9, 2008 (gmt 0)

Hi smallcompany,

Yes, all of the tools can be found at some level of implementation riddled all over the internet - many of these do such and such a level of detail to get you in and buy the product behind, there are a few gems out there like w3 that simply host tools of great worth without seeming to ask more than that you use them to make a decent pig of your site.

I'm writing a crawler of my own because it seems that to have everything I feel one program should be able to detail for you, you have to buy two programs currently available on the market and use a few of the free-hosted-tools. Terribly useful already but I am not ready to release it, it needs to do plenty more in my opinion.

Among other great search terms for finding free hosted tools this is probably the best one;

seo tools

Regards,
robsoles.

visa666




msg:3570455
 4:48 pm on Feb 9, 2008 (gmt 0)

Hi rabsoles

Guess What?

Google boot yesterday got 1 page and one more today! This is the first time in month when it passes the robots.txt
Probably they just heard I was talking to you.
I'll pm you the domain.

Thanks again

robsoles




msg:3570687
 1:25 am on Feb 10, 2008 (gmt 0)

visa666,

I threw a few tools at the domain, yahoo has info about your site and Google does not, actually invariably a bad sign. The 'GET's that Googlebot is performing on pages on your site may indicate the end of the problem of not even being in Google's index but it may just be confirming whatever it didn't like enough to switch to just 'HEAD'ing documents off your site in the first place.

in Google:
site:<insert-your-domain>

in Yahoo:
linkdomain:<insert-your-domain> -site:<insert-your-domain>

A general domain checking service detailed a few areas of concern I'd fix if I was running your show. One or two statistical sites couldn't cope with requests about your domain.

I found an email address in the contact page of the site and will send my crawler's output with some explanation in the email and links to whatever else I find of interest about your site to that address.

Regards,
robsoles.

robsoles




msg:3570706
 1:53 am on Feb 10, 2008 (gmt 0)

actually visa666, it's occured to me that maybe that's not your email address on that contact page and it could even be rude of me to send my email there, I'll reply your SM with request for your preferred email address.

robsoles.

visa666




msg:3570721
 2:44 am on Feb 10, 2008 (gmt 0)

Sorry, didn’t mean to …Typing mistake.

You can use that e-mail but I PM you a different one

Thank you again rObsoles

robsoles




msg:3570726
 3:01 am on Feb 10, 2008 (gmt 0)

Hey visa666,

Don't stress, although...

The details my crawler pulled out of the host worry me a bit but while I was composing the email I am sending, a simple enough Google search occurred to me and

blacklist "<insert-domain-here>"

in Google search box, brings the site up with a blacklist context, it's a clickable thing in the email.

You are welcome, and Good Luck!
robsoles.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Website Analytics - Tracking and Logging
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved