|Looks Like Google DOES Spider Gmail for the Index|
I noticed in another forum that the idea of google spidering gmail is really hard to swallow. So... i'm posting the thought here.
Google DOES spider gmail.
First of all, if it didnt how could it know what adwords ads to target?
2. YES... the links in gmail DO effect your google rank.
I know this from three unique experiences.
1. I have a large loyal email list. Everytime I send out an email that encourages being forwarded... my site pops higher one slot...
2. I sent an email with a url link that is NOT linked ANYWHERE ELSE from my yahoo to my gmail.com account and that site DID show up in Google.
3. I asked a google engineer at an O'reilly conference and he said... Google looks at all the data it has to determine communication trends with its algorithm.
4. It's logical. The way a computer science phd would think. If i click a link in my gmail.. GOOGLE KNOWS that link is worth looking at. If i DONT click the link OR if i dont even open the email and just send it to spam... then it's obviously spam.
BUT if I launch a really cool new blog post or website... people DO email it to their friends. PROBABLY MORE often than they would blog about it. no?
UPDATED Related Threads
[edited by: Brett_Tabke at 3:50 pm (utc) on April 16, 2009]
Hello sparkah, and welcome to the forums.
Ads in gmail are clearly matched to the email content - so you make a solid point there. We know that the email text automatically gets read.
And I once saw a hint that Google might have used the link in an email for discovery - but I wouldn't definitively say that this is true. For example, that url might also have appeared in an open server log somewhere. Or it might have been typed into someone's Google toolbar, or even into the Google search box.
I've been regularly sending one particular link through gmail for the past two months, and I know for certain that link has been followed by over 150 people. But the url that the link points to is not in the Google index, even today. Even more, googlebot has not visited the page even once over those two months.
That experience (and several others like it) also does not disprove anything, just as seeing such a new url in the index didn't prove anything.
What I've never seen is a sequence of events where an unindexed url in an email is followed quickly by a googlebot visit to that url. That would be highly suggestive, but even then, still not definitive unless it were repeatable - and it's not. Just try it and you'll see, an email link is not a googlebot whistle.
So I find it unlikely that ranking changes are also affected by links that are sent around through gmail. I haven't even seen clear evidence of url discovery through that mechanism - and I've been looking for it.
I'm dreaming of a bot net. A bot net that opens gmail accounts, ones that I can send an emai to with a link in it, and the botnet automatically opens emails and clicks on the links. :)
Actually, if all the talk of bounce rates and visits are true, I'm surprised there's not a bot net out there hiding behind a cool toolbar that does google searches and clicks, already. Maybe there is, I don't follow that stuff.
Yeah... Tedster, agreed that just sending an email to a gmail account is not the panacea of getting into the google index... I mean after all... You can even put up a link in digg.com and not end up in the index.
[edited by: tedster at 11:18 pm (utc) on Jan. 11, 2009]
[edit reason] sorry, no search terms [/edit]
|2. I sent an email with a url link that is NOT linked ANYWHERE ELSE from my yahoo to my gmail.com account and that site DID show up in Google. |
Is it possible you have the Google toolbar and this was responsible for it being found? Just a thought. Also, if you are logged in to your Google account (generally), does it not note your crawl history and use that to find new sites?
[edited by: Simsi at 10:54 pm (utc) on Jan. 11, 2009]
you may have just found another facet of google's indexing algorithm!
But in my case.. no... no toolbar installed.
Google uses ever little BIT (literally) of data it has control over to calculate a website's relevance.
I bet ya... REALLY REALLY BET YA... that google will use it's Grandcentral.com and G1 visual voice mail for indexing as soon as it figures out how to do speech recognition AND OR when it lets you TAG your own voicemails!
As I mentioned in the thread about faster indexing, you don't even need a toolbar. Some people will copy the url in an email and paste it into the Google SEARCH box. I've watched that being done!
Interesting, links in emails carry some value. I'm inclined to think that instead of carrying link value they receive "one point", meaning the pages gets credit for "one email link", and only Google knows how much value that would have.
Emails aren't already ranked so the value is probably minimal, who knows if there is a tiered effect too, meaning 50 emails = PR2, 500 emails = PR3 etc.
Toss this in the same group with number of bookmarks a page receives. It's very clear that in order to rank well a page needs to be discussed more than other pages deemed to be in the same category. eMail links and bookmarks indicate just that.
Worth looking into for sure. Thanks.
You don't even need a toolbar installed. Just a pagerank indicator of some sort. That sends every page you visit to Google.
> And I once saw a hint that Google might have used the link in an email for discovery - but I wouldn't definitively say that this is true. For example, that url might also have appeared in an open server log somewhere. <
Note that there are a number of sites out there which essentially "publish emails." That is, the site operator signs up for a variety of mailing lists (mostly discussion groups, but also many others), and the emails are automatically posted as content on the web site. Usually this is used as a strategy to "feed content" into Google so that traffic can be drawn from search traffic (and of course the site operator posts advertising, such as AdSense ads, on these sites).
This is certainly one way that you might find a URL indexed by Google, even though you only mentioned it in an email sent to a relatively small group of subscribers or customers.
I know this is a slightly older thread, but I had an issue today where I was writing an authentication script for someone to register for a voting site. Basically, it sends an email to the user with a hash code to confirm their registration.
The next day, I had an error message generated by the system. I had deleted the test user, so when the link was "clicked", an empty recordset was returned.
So, the email itself was only sent to myself, by a test email server to my Google Apps email address.
When lookign at the error log, the link that was followed had the querystring hash code in it and the user agent was
"HTTP_USER_AGENT: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
So, hmmm... this confirms to me that GoogleBot does indeed index links in Gmail messages. So, for confirmation links... what's this going to do? It presents a nightmare for web site owners, eh?
I can write an URL rewrite function in htaccess that will prevent googlebot from indexing, but will I have to go back and reprogram older scripts that use authentication links?
What do we do?
PS... here's another site I found in my searching for clues on this. So, I might have been an anomaly, but here is a second confirmation of the issue:
< url removed >
Further Notes, added a few minutes after posting original:
1) I did not have the google toolbar installed on my testing browser.
2) I did not click the link IN gmail interface, it was fetched via pop3 in the thunderbird email reader.
3) It was clicked on and link went straight to page. There were no redirects.
This indicates that the email was "read" by the googlebot upon delivery in the folder without a user even looking at and the googlebot visited the link within 24 hours on a domain that I had created the day before.
[edit reason: See Forum Charter [webmasterworld.com]]
[edited by: tedster at 7:23 pm (utc) on Mar. 5, 2009]
|I can write an URL rewrite function in htaccess that will prevent googlebot from indexing, but will I have to go back and reprogram older scripts that use authentication links? |
Well, as a first step, you can ban that page (or even the whole directoryholding it) in your robots.txt
Not foolproof, of course, but a good start!
I've read in many places that the robots.txt file doesn't stop GoogleBot from looking at the content of the page (requesting the URL), it just prevents it from adding that content to it's index.
In this case, given that it's a confirmation URL... what do we do? If googleBot requests the URL in any way, it's going to confirm the user.
For now, in my confirmation script, I have chosen to the look at the user agent field and if it's got googlebot in it, then I treat it as though it's a 404 not found.
Other suggestions would be appreciated.
But for any SEO'r it brings to light a lot of questions:
How will double opt-in email confirmations work? Will we get a lot of false positives? Will our spam rates increase because people are "automatically" signed up? What percentage of users will this effect? Will it be left to the programmer to fix or Google?
Then, when you think about software like Joomla and Community Builder registrations, what will happen there? Or even the Webmaster World community... when I registered here, I was sent an email confirmation link. Then I was able to log in to my account. No further steps after registering.
Thanks for the input, I just don't know that robots.txt will solve the bigger problem.
If you have the resources to do things like agent checking (most won't - sounds like you do :)) then, certainly, go for it.
But for most people the robots.txt entry is all they can have and a lot of the time it will work.
Google doesn't always obey the robots entry (dammit!) but 99% (unresearched number) of the time it does.
|I've read in many places that the robots.txt file doesn't stop GoogleBot from looking at the content of the page (requesting the URL), it just prevents it from adding that content to it's index. |
This sounds like bad information to me. Once googlebot spiders your robots.txt file and processes it, then those url requests should stop. Googlebot will spider urls (make that MUST spider urls) that use a noindex meta tag - possibly that's where the confusion is coming in.
Google occasionally blows it and indexes pages which robots tells it not to - I have had to go into WMT a couple of times and remove directories where the robots.txt was correctly blocking them.
But its very, very rare.