Welcome to WebmasterWorld Guest from 54.167.40.25

Forum Moderators: open

Message Too Old, No Replies

Google asking to include my web site

I received an email from Google today

     
11:22 am on Jul 25, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2002
posts:152
votes: 0


I received an email from Google today - they would like to spider one of my web sites and include it in the Google index.

The web site in question has a "robots.txt"-file which dissallows all spidering (I donīt want it in the SEs at the moment). The emails gives instructions how to change my robots.txt, so that Googlebot is allowed to spider my web site (using Googleīs very special "allow" tag).

Ok, I admit, it feels good to have this game the other way round - Google asking *me* to have my web site included. I feel flattered. :)

Back to reality, I donīt think they love my site so much that they selected it by hand. So Google probably sends out these emails automatically and in large numbers. The bad "s"-word comes to my mind. This would be 100% spam, in my opinion.

My first thought was, that the email was a fake, not from Google at all. But it looks pretty real, and I donīt really see the benefit a third party would have from faking these emails. The email comes from "crawl-coverage@google.com" and is signed by a real person from Googleīs "Business Development".

Second thought - I am running AdWords for the web site in question. Maybe they only send these emails to advertisers with robots.txt that keep Googlebot from spidering. Google could justify this, IMO, and it makes some sense - in most cases, if you run Adwords for a certain web site, you also want it to be spidered.

But if this would be true, this would be the first proven connection between advertising on Google and having your web site spidered by Googlebot. Anybody else got these emails?

2:29 pm on July 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


Anybody else got these emails?

Did Google Ask Me to Rewrite My robots.txt? - A strange email. [webmasterworld.com]

2:33 pm on July 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


I'm missing something. If your site is robot.txt'd out to google how can the adsense spider access it? or do you already have an "allow" for just that spider?
2:42 pm on July 25, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:June 6, 2003
posts:67
votes: 0


You're missing something.

He never mentioned AdSense.

2:42 pm on July 25, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2002
posts:152
votes: 0


Reading the other thread (thanks for the link, must have missed that post), it seems this really is a single, personal mail from someone at Google. So, I just feel a little honored and forget about my spam accusation. :)
2:47 pm on July 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


ah thanks bolitto. i got adwords and adsense mixed up again!
3:08 pm on July 25, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2002
posts:152
votes: 0


Takagi just pointed me to another web site, the text of these "please-change-your-robots.txt-for-google" emails is available there (I think posting the text here is not permitted). Looks just the same as mine (in german).

Hmmm..

In the other thread mentioned above, GoogleGuy says an email like this should be regarded as a single event, some sort of a personal decision from someone at Google. Surely we are the smart guys and donīt believe everything the google guys want us to. :)

GGīs version of how and why such an email might have been sended is a little hard to follow - there is a fixed text for these emails, so sending them out was obviously planned, and planned in larger numbers. You wouldnīt sit down and write a template for these emails (and even translate them) if you just wanted to send one or two emails, right?

The evil "s"-word is back in town.

3:34 pm on July 25, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


Hi Sudden,

You thought it was an honor to receive such an email, and now I spoiled your party. Sorry.

I understand your site is in German. Well, if you want to boycott Google, I have also some good news for you. In the last 13 months the percentage of German pages in the index of FAST/AllTheWeb grew from 6.6% to 8.6% (see this [webmasterworld.com] thread).

5:37 pm on July 28, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


I really think that calling this email spam is inappropriate, and talking about boycotting Google is just ludicrous.

First, just because two webmasters have received similar emails (actually I think there was a third example discussed some time back), does not mean that emails are being sent out en masse.

Second, this is not a commercial email except in the remotest sense. In fact, while there are lots of valid reasons to exclude Googlebot from a site, I'm sure most people here will agree that Google is potentially doing this webmaster a huge favor.

5:52 pm on July 28, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Hey sudden, I do think this email was sent because someone at Google thought your site was good, and that we'd like to get that content into our index. I'll be happy to ask around about it.
7:44 pm on July 28, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member bigdave is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 19, 2002
posts:3454
votes: 0


I would not be the least bit surprised that there might be a form letter available for requesting that someone allow google to crawl their site. It makes sure that they say everything that they want to say without making the mistakes that can come up when writing individual e-mails.

Just because it is a form letter does not mean that they are not sent out on a case by case basis. It just means that they want to be careful about the wording.

4:58 am on July 29, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:July 7, 2003
posts:442
votes: 0


Are you able to view the log of your mailserver?

If so, check the IP of the mailserver that mail came from. It is very possible that someone just spoofed in the google.com domain, but why?

If a bot wanted to crawl your site so bad, what does it need permission in the robots.txt for? Any bot that would want to crawl a site (look for email, etc) would just do it, wouldn't it? If it was a spam situation, and the sender for some reason wants your site listed in google (?) he probubly already used some sort of bot to get your email address off your page.

If a spammer was responsible for writting these letters to people, for a reason that i can't really understand, why would they take the time to hand pick out email addresses from websites? All of this leads me to think it was someone legit.

Hey GoogleGuy, can you "ask around the plex" to make sure everyone is finding my site in the SERPs easily? :)

5:47 am on July 29, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 6, 2003
posts:9
votes: 0


I've been writing spiders or netbots since 1997. Spiders may choose to honor the robots.txt file if they want. But, it is in fact hard to block spiders from downloading web pages from a web site.

Google is just being nice (in this case), paying attention to the robots.txt file, and sending targeted email.

7:27 am on July 29, 2003 (gmt 0)

Junior Member from GB 

10+ Year Member

joined:Oct 16, 2002
posts:162
votes: 0


I don't have time to read the whole thread (I did notice you took it back) but it always amazes me how people are so quick to jump on the "This Is Spam" bandwagon.

Even if Google were sending these mails automatically to everyone who blocks them in robots.txt I do not see how it is spam. It a specifc email telling you something is wrong and how to fix it, it's not trying to sell you anything or waste your time with something you don't want. OK it might take some time to read it and you might already know but do we really want companies to feel like they can't do things like this?

Yes spams annoying, but don't tarnish everything with the same brush in my opinion.

9:08 am on July 29, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2002
posts:152
votes: 0


Hey GoogleGuy, may I quote you? Would be nice to mention Googleīs love for my web site on my front page! :)

Ok, I admit I am a little sensible when it comes to spam (in fact I am drowning in it, the ultimate test for any anti spam software), but I donīt mind personally about this email from Google. If my web site was hand picked, itīs charming, isnīt it..?

I started this thread to hear if others got these emails and to hear opinions. As there are not hundreds of other webmasters raising there hands, saying they got these emails as well, I agree with the previous posts and wouldnīt call this spam.

9:54 am on July 29, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


.. talking about boycotting Google is just ludicrous.

Sure it is at this moment. I wasn't really seriously suggesting sudden should do so.

It a specifc email telling you something is wrong and how to fix it ..

There is nothing wrong with having a robots.txt file disallowing a spider! In my understanding the topic is not a parsing problem of the file, it is about allowing Googlebot to crawl (some of) the pages.
2:20 pm on July 29, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Apr 24, 2003
posts:216
votes: 0


Some people use the spam word WAY too much.

This is the reason why Google could never institute a system (even if they wanted, which i'm sure they don't), where they auto sent a webmaster an email notifying them that their site is banned and how to fix the problem.

So many people want to scream "spam!"

Even if that letter was a template and sent to 100's of sites I don't see where the problem is.

It doesn't appear to be an auto-message sent all across the 'net to any site blocking Google because than it would be pretty common. So if you received that email than most likely someone liked your site and is wondering why you're blocking it from getting into the index.

The only way I can see someone not from Google sending that email is if the person was a link partner and wants your site in Google for pagerank purposes. Most of the other conspiracy theories about this are pretty silly.

It's just a harmless email.

2:35 pm on July 29, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 5, 2001
posts:724
votes: 0


Just because it is a form letter does not mean that they are not sent out on a case by case basis. It just means that they want to be careful about the wording.

Exactly. We have several prewritten emails and other documents at my company which are used as templates, some of which may be sent out only every couple of months. Usually it's because I wrote them and have someone else send them out, and want to be sure the message is communicated uniformly and accurately.

I'm sure it's a very common practice.

2:36 pm on July 29, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 25, 2003
posts:970
votes: 0


Ok i would add my impartial comments. I have followed at least 3 cases of such letters by google. The original people who received the letter felt the same feelings of intrigue. But I think in each case there was a real person from google who responded to the reply email address. So I do not think there is any Spin attached when GG says that someone in Google might have liked your site. HTH :)
3:03 pm on July 29, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 22, 2003
posts:118
votes: 0


Hi sudden,

Perhaps AdWords "context" ads could be the answer.

In reviewing our Adwords results using their new reporting format, I see "total content targeting" is used in place of keywords for several destination URLs.

Possibly AdWords is using the whole page to match against AdSense publisher pages - so it needs to crawl your pages?

My 2 clams.

3:12 pm on July 29, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 28, 2002
posts:89
votes: 0


I got the same e-mail except there was a persons name, including a signature (as below) with an e-mail address.

Persons Name
Business Development
Google Inc.
2400 Bayshore Parkway
Mountain View, CA 94043
personsname@google.com

I think its great, I'm flattered.

3:31 pm on July 29, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 24, 2002
posts:1130
votes: 0


Some more examples [google.com]
4:11 pm on July 29, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:July 29, 2003
posts:149
votes: 0


Just wondering... why would you disallow spidering and use adwords at the same time?

It seems an odd combo, and I am curious.

6:34 am on July 30, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Apr 24, 2003
posts:216
votes: 0


They could be doing research on a new site to see how effective adwords is. I can think of a few other reasons too.(Although I don't agree with any of the reasons.)

But you're right, very odd combo.

6:40 am on July 30, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member chiyo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 21, 2000
posts:3170
votes: 0


>>Just wondering... why would you disallow spidering and use adwords at the same time?
It seems an odd combo, and I am curious.
<<

Just a guess for one situation... someone may have sites designed for the same keywords, one designed to close the sale which they advertise though Adwords. The other one is more of a soft-sell info site that is better suited for the main listings, and they dont really want to compete with their own site for the info searchers.

This way they can have one listing in the buying section (adwords), one in the info section (on the left) and as a bonus they dont have to worry about duplicate content or cross-linking issues for the info site(s) caused by their commercial site.

8:43 am on July 30, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 14, 2002
posts:152
votes: 0


Itīs not as fancy as that - Google indexed some pages we most definately did not want to be indexed - internal stuff with no inbound links. I really donīt not know how it happended - probably the toolbar. As the web site was not designed having SEO in mind, we never got any visitors from the SEs (without paying for them). So, having nothing to loose and being a litte lazy I blocked all spiders from the whole web site.

Sorry, no genius marketing or SEO ideas involved. :)

12:52 am on Aug 3, 2003 (gmt 0)

New User

10+ Year Member

joined:Mar 6, 2003
posts:9
votes: 0


GoogleBot is pretty aggressive about indexing stuff, reaching into images and OCRing them and more.

Sometimes parts of my web pages had examples that I didn't want indexed, yet GoogleBot went in there anyway. Some of these pages were eventually banned by human editors, because of Google's own mistakes.

So, I found that I could exclude GoogleBot from this kind of non-core content through an included Javascript file. Just document.write("CONTENT") in the .js file.

<script language=javascript src=yourfile.js></script>

This trick keeps overly-testosteronous GoogleBot from indexing parts of your web page that are distractors from your main message. Then, you don't get banned by "human editors": minimum wage slackers at Google that can barely read. You are at their mercy.

3:24 am on Aug 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


Yeah, Google tends to index the entire web page. I don't think most people consider that to be unexpected or overly aggressive, though.

They do index images, but there are several ways to prevent his and I've never seen evidence they apply OCR to them.

7:36 am on Aug 3, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


That's right, we don't do OCR in images. I'm not aware of any bugs in robots.txt within the last several months (maybe years?), so consider using robots.txt to keep us from crawling pages from specific parts of your site. There's also the "noindex" and "nofollow" meta tags, password protecting web pages with .htaccess, etc. There's lots of tools to herd those little googlebots and keep 'em fenced in so they don't go where you don't want 'em. :)
11:17 am on Aug 3, 2003 (gmt 0)

New User

10+ Year Member

joined:July 28, 2003
posts:16
votes: 0


Talking about letting pass googlebot, here's my actual robots.txt

#
# Hello Dear Googlebot, nice to see you again.
# Please get in and take it as your own home
# You can crawl as deep as you want, no problem.
# I will be in the kitchen , just whistle if you need something

User-agent: Mr Googlebot
allow: /Absolutely ALL

User-agent: Mr Googlebot
allow: /Wife and bedroom if you like, no prob.

by the way, it's not so much bootlicker, isn't?

This 33 message thread spans 2 pages: 33