homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 33 message thread spans 2 pages: 33 ( [1] 2 > >     
Google asking to include my web site
I received an email from Google today

 11:22 am on Jul 25, 2003 (gmt 0)

I received an email from Google today - they would like to spider one of my web sites and include it in the Google index.

The web site in question has a "robots.txt"-file which dissallows all spidering (I donīt want it in the SEs at the moment). The emails gives instructions how to change my robots.txt, so that Googlebot is allowed to spider my web site (using Googleīs very special "allow" tag).

Ok, I admit, it feels good to have this game the other way round - Google asking *me* to have my web site included. I feel flattered. :)

Back to reality, I donīt think they love my site so much that they selected it by hand. So Google probably sends out these emails automatically and in large numbers. The bad "s"-word comes to my mind. This would be 100% spam, in my opinion.

My first thought was, that the email was a fake, not from Google at all. But it looks pretty real, and I donīt really see the benefit a third party would have from faking these emails. The email comes from "crawl-coverage@google.com" and is signed by a real person from Googleīs "Business Development".

Second thought - I am running AdWords for the web site in question. Maybe they only send these emails to advertisers with robots.txt that keep Googlebot from spidering. Google could justify this, IMO, and it makes some sense - in most cases, if you run Adwords for a certain web site, you also want it to be spidered.

But if this would be true, this would be the first proven connection between advertising on Google and having your web site spidered by Googlebot. Anybody else got these emails?



 2:29 pm on Jul 25, 2003 (gmt 0)

Anybody else got these emails?

Did Google Ask Me to Rewrite My robots.txt? - A strange email. [webmasterworld.com]


 2:33 pm on Jul 25, 2003 (gmt 0)

I'm missing something. If your site is robot.txt'd out to google how can the adsense spider access it? or do you already have an "allow" for just that spider?


 2:42 pm on Jul 25, 2003 (gmt 0)

You're missing something.

He never mentioned AdSense.


 2:42 pm on Jul 25, 2003 (gmt 0)

Reading the other thread (thanks for the link, must have missed that post), it seems this really is a single, personal mail from someone at Google. So, I just feel a little honored and forget about my spam accusation. :)


 2:47 pm on Jul 25, 2003 (gmt 0)

ah thanks bolitto. i got adwords and adsense mixed up again!


 3:08 pm on Jul 25, 2003 (gmt 0)

Takagi just pointed me to another web site, the text of these "please-change-your-robots.txt-for-google" emails is available there (I think posting the text here is not permitted). Looks just the same as mine (in german).


In the other thread mentioned above, GoogleGuy says an email like this should be regarded as a single event, some sort of a personal decision from someone at Google. Surely we are the smart guys and donīt believe everything the google guys want us to. :)

GGīs version of how and why such an email might have been sended is a little hard to follow - there is a fixed text for these emails, so sending them out was obviously planned, and planned in larger numbers. You wouldnīt sit down and write a template for these emails (and even translate them) if you just wanted to send one or two emails, right?

The evil "s"-word is back in town.


 3:34 pm on Jul 25, 2003 (gmt 0)

Hi Sudden,

You thought it was an honor to receive such an email, and now I spoiled your party. Sorry.

I understand your site is in German. Well, if you want to boycott Google, I have also some good news for you. In the last 13 months the percentage of German pages in the index of FAST/AllTheWeb grew from 6.6% to 8.6% (see this [webmasterworld.com] thread).


 5:37 pm on Jul 28, 2003 (gmt 0)

I really think that calling this email spam is inappropriate, and talking about boycotting Google is just ludicrous.

First, just because two webmasters have received similar emails (actually I think there was a third example discussed some time back), does not mean that emails are being sent out en masse.

Second, this is not a commercial email except in the remotest sense. In fact, while there are lots of valid reasons to exclude Googlebot from a site, I'm sure most people here will agree that Google is potentially doing this webmaster a huge favor.


 5:52 pm on Jul 28, 2003 (gmt 0)

Hey sudden, I do think this email was sent because someone at Google thought your site was good, and that we'd like to get that content into our index. I'll be happy to ask around about it.


 7:44 pm on Jul 28, 2003 (gmt 0)

I would not be the least bit surprised that there might be a form letter available for requesting that someone allow google to crawl their site. It makes sure that they say everything that they want to say without making the mistakes that can come up when writing individual e-mails.

Just because it is a form letter does not mean that they are not sent out on a case by case basis. It just means that they want to be careful about the wording.


 4:58 am on Jul 29, 2003 (gmt 0)

Are you able to view the log of your mailserver?

If so, check the IP of the mailserver that mail came from. It is very possible that someone just spoofed in the google.com domain, but why?

If a bot wanted to crawl your site so bad, what does it need permission in the robots.txt for? Any bot that would want to crawl a site (look for email, etc) would just do it, wouldn't it? If it was a spam situation, and the sender for some reason wants your site listed in google (?) he probubly already used some sort of bot to get your email address off your page.

If a spammer was responsible for writting these letters to people, for a reason that i can't really understand, why would they take the time to hand pick out email addresses from websites? All of this leads me to think it was someone legit.

Hey GoogleGuy, can you "ask around the plex" to make sure everyone is finding my site in the SERPs easily? :)


 5:47 am on Jul 29, 2003 (gmt 0)

I've been writing spiders or netbots since 1997. Spiders may choose to honor the robots.txt file if they want. But, it is in fact hard to block spiders from downloading web pages from a web site.

Google is just being nice (in this case), paying attention to the robots.txt file, and sending targeted email.

Made In Sheffield

 7:27 am on Jul 29, 2003 (gmt 0)

I don't have time to read the whole thread (I did notice you took it back) but it always amazes me how people are so quick to jump on the "This Is Spam" bandwagon.

Even if Google were sending these mails automatically to everyone who blocks them in robots.txt I do not see how it is spam. It a specifc email telling you something is wrong and how to fix it, it's not trying to sell you anything or waste your time with something you don't want. OK it might take some time to read it and you might already know but do we really want companies to feel like they can't do things like this?

Yes spams annoying, but don't tarnish everything with the same brush in my opinion.


 9:08 am on Jul 29, 2003 (gmt 0)

Hey GoogleGuy, may I quote you? Would be nice to mention Googleīs love for my web site on my front page! :)

Ok, I admit I am a little sensible when it comes to spam (in fact I am drowning in it, the ultimate test for any anti spam software), but I donīt mind personally about this email from Google. If my web site was hand picked, itīs charming, isnīt it..?

I started this thread to hear if others got these emails and to hear opinions. As there are not hundreds of other webmasters raising there hands, saying they got these emails as well, I agree with the previous posts and wouldnīt call this spam.


 9:54 am on Jul 29, 2003 (gmt 0)

.. talking about boycotting Google is just ludicrous.

Sure it is at this moment. I wasn't really seriously suggesting sudden should do so.

It a specifc email telling you something is wrong and how to fix it ..

There is nothing wrong with having a robots.txt file disallowing a spider! In my understanding the topic is not a parsing problem of the file, it is about allowing Googlebot to crawl (some of) the pages.


 2:20 pm on Jul 29, 2003 (gmt 0)

Some people use the spam word WAY too much.

This is the reason why Google could never institute a system (even if they wanted, which i'm sure they don't), where they auto sent a webmaster an email notifying them that their site is banned and how to fix the problem.

So many people want to scream "spam!"

Even if that letter was a template and sent to 100's of sites I don't see where the problem is.

It doesn't appear to be an auto-message sent all across the 'net to any site blocking Google because than it would be pretty common. So if you received that email than most likely someone liked your site and is wondering why you're blocking it from getting into the index.

The only way I can see someone not from Google sending that email is if the person was a link partner and wants your site in Google for pagerank purposes. Most of the other conspiracy theories about this are pretty silly.

It's just a harmless email.


 2:35 pm on Jul 29, 2003 (gmt 0)

Just because it is a form letter does not mean that they are not sent out on a case by case basis. It just means that they want to be careful about the wording.

Exactly. We have several prewritten emails and other documents at my company which are used as templates, some of which may be sent out only every couple of months. Usually it's because I wrote them and have someone else send them out, and want to be sure the message is communicated uniformly and accurately.

I'm sure it's a very common practice.


 2:36 pm on Jul 29, 2003 (gmt 0)

Ok i would add my impartial comments. I have followed at least 3 cases of such letters by google. The original people who received the letter felt the same feelings of intrigue. But I think in each case there was a real person from google who responded to the reply email address. So I do not think there is any Spin attached when GG says that someone in Google might have liked your site. HTH :)


 3:03 pm on Jul 29, 2003 (gmt 0)

Hi sudden,

Perhaps AdWords "context" ads could be the answer.

In reviewing our Adwords results using their new reporting format, I see "total content targeting" is used in place of keywords for several destination URLs.

Possibly AdWords is using the whole page to match against AdSense publisher pages - so it needs to crawl your pages?

My 2 clams.


 3:12 pm on Jul 29, 2003 (gmt 0)

I got the same e-mail except there was a persons name, including a signature (as below) with an e-mail address.

Persons Name
Business Development
Google Inc.
2400 Bayshore Parkway
Mountain View, CA 94043

I think its great, I'm flattered.


 3:31 pm on Jul 29, 2003 (gmt 0)

Some more examples [google.com]


 4:11 pm on Jul 29, 2003 (gmt 0)

Just wondering... why would you disallow spidering and use adwords at the same time?

It seems an odd combo, and I am curious.


 6:34 am on Jul 30, 2003 (gmt 0)

They could be doing research on a new site to see how effective adwords is. I can think of a few other reasons too.(Although I don't agree with any of the reasons.)

But you're right, very odd combo.


 6:40 am on Jul 30, 2003 (gmt 0)

>>Just wondering... why would you disallow spidering and use adwords at the same time?
It seems an odd combo, and I am curious.

Just a guess for one situation... someone may have sites designed for the same keywords, one designed to close the sale which they advertise though Adwords. The other one is more of a soft-sell info site that is better suited for the main listings, and they dont really want to compete with their own site for the info searchers.

This way they can have one listing in the buying section (adwords), one in the info section (on the left) and as a bonus they dont have to worry about duplicate content or cross-linking issues for the info site(s) caused by their commercial site.


 8:43 am on Jul 30, 2003 (gmt 0)

Itīs not as fancy as that - Google indexed some pages we most definately did not want to be indexed - internal stuff with no inbound links. I really donīt not know how it happended - probably the toolbar. As the web site was not designed having SEO in mind, we never got any visitors from the SEs (without paying for them). So, having nothing to loose and being a litte lazy I blocked all spiders from the whole web site.

Sorry, no genius marketing or SEO ideas involved. :)


 12:52 am on Aug 3, 2003 (gmt 0)

GoogleBot is pretty aggressive about indexing stuff, reaching into images and OCRing them and more.

Sometimes parts of my web pages had examples that I didn't want indexed, yet GoogleBot went in there anyway. Some of these pages were eventually banned by human editors, because of Google's own mistakes.

So, I found that I could exclude GoogleBot from this kind of non-core content through an included Javascript file. Just document.write("CONTENT") in the .js file.

<script language=javascript src=yourfile.js></script>

This trick keeps overly-testosteronous GoogleBot from indexing parts of your web page that are distractors from your main message. Then, you don't get banned by "human editors": minimum wage slackers at Google that can barely read. You are at their mercy.


 3:24 am on Aug 3, 2003 (gmt 0)

Yeah, Google tends to index the entire web page. I don't think most people consider that to be unexpected or overly aggressive, though.

They do index images, but there are several ways to prevent his and I've never seen evidence they apply OCR to them.


 7:36 am on Aug 3, 2003 (gmt 0)

That's right, we don't do OCR in images. I'm not aware of any bugs in robots.txt within the last several months (maybe years?), so consider using robots.txt to keep us from crawling pages from specific parts of your site. There's also the "noindex" and "nofollow" meta tags, password protecting web pages with .htaccess, etc. There's lots of tools to herd those little googlebots and keep 'em fenced in so they don't go where you don't want 'em. :)


 11:17 am on Aug 3, 2003 (gmt 0)

Talking about letting pass googlebot, here's my actual robots.txt

# Hello Dear Googlebot, nice to see you again.
# Please get in and take it as your own home
# You can crawl as deep as you want, no problem.
# I will be in the kitchen , just whistle if you need something

User-agent: Mr Googlebot
allow: /Absolutely ALL

User-agent: Mr Googlebot
allow: /Wife and bedroom if you like, no prob.

by the way, it's not so much bootlicker, isn't?

This 33 message thread spans 2 pages: 33 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved