homepage Welcome to WebmasterWorld Guest from 54.196.168.78
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 56 message thread spans 2 pages: < < 56 ( 1 [2]     
Is Google crawling IRC?
Is Google logging IRC channels?
punta




msg:109616
 10:57 am on Nov 5, 2003 (gmt 0)

This site has reports of some people seeing Google machines on their IRC channels [manero.org...]
This is interesting. Perhaps Google are planning to log Internet Relay Chats so people can search them, like they can withe USENET threads.

It's quite an interesting development. What do people think to this?

 

JasonHamilton




msg:109646
 3:01 pm on Nov 17, 2003 (gmt 0)

maybe not MOTD, but what about channel topics?

I already do that.

Google gets a listing of all channels and topics on an IRC network. When it sees a URL in the topic of a channel, it enters the channel. Once it's in, it gets a list of all users.

It then performs analysis on this information to determine the importance of the link. If a channel has just one person in it, then it could easily be spam so it won't have much importance assigned to it.

The channel list will show the number of users within the channel, so joining the channel will not provide any additional information. Add to it that large channels already suffer from being larger than what IRC was originally designed for, a robot that joins every time it looks at the list will not likely be welcome unless it also promotes that channel in some way (and even then, some channels don't want any promotion).

Also I think that assuming a larger channel equates to having more important topics is a false assumption. Only the channel operators get to change the topics, so that limits the field of who gets access to topic changing, and that there are many mismanaged channels that have large numbers of users.

IRC messages are messages sent to a server and then distributed to all the other servers in that network

IRC messages are routed only to servers that have users within a channel, and servers between the path of servers having users in a channel. It is quite different than newsgroups where every server can get every message, and that any users can "join" a newsgroup at any time after the fact and read the messages. On IRC, what you say is limited in scope to those in the channel at that specific time.

It's understood that all IRC messages, even private messages are sent across an open network and can be intercepted by nosey network administrators

Yes, but that's true of anything on the internet. That doesn't mean users will throw on a happy face and be thankful for someone logging their discussions. Keep in mind that while the IRC protocol does not encrypt communications (neither does the www, pop3, etc), there are SSL-type modifications that are available.

Surely you mean arrogant, not technical?

<Disclamer: I'm aware file sharing on IRC is outside the scope of the IRC protocol, but this is a good example of technical vs arrogant> You should see the number of kazza users who give up on IRC after finding out you can't just right click on files to download. It takes a while to learn of all the nuances of IRC.

USENET used to be just as hard (if not harder), before Google broke the clique and made it easy to use.

You mean Dejanews, right? It's definately easier to search throught newsgroups for information. However to participate and post remains fundementally the same. NG programs like Forte Agent have never been all that difficult to use, the hardest part is finding out your local ISP's news group server name :)

Now you're just showing your ignorance. There's plenty of support groups on USENET, and it's easier to be anonymous on USENET than it is on IRC.

I never said there were no support groups in NG's. However, thinking only the person(s) you are talking to on a NG are reading your posts would be quite foolish. Mrs. Henderson (from my previous example) would not have (and should not have) expected her questions to be published elsewhere. While everyone should be aware that your communications on the internet can be intercepted, that is quite different than intercepting communications methodically and posting it on websites, cached forever.

punta




msg:109647
 3:25 pm on Nov 17, 2003 (gmt 0)

I'll stop here before this turns in to a silly flame. Thjis'll be my last post on the matter.

The main point is this.

* USENET users were against the idea of logging just as IRC users are today. The fact is that what Deja News (Now Google Groups) and other similar services at the time were offering was useful to the users. Logging of News is now fully accepted and it's very easy to opt-out. If Google were to log conversations in IRC, and it were to prove useful in the same way as USENET, then I think it could catch on.

That's a big if. I doubt they are logging IRC, but just because some start-up with $10m of champagne money (venture capital) failed, that does not mean Google will fail too.

punta




msg:109648
 3:32 pm on Nov 17, 2003 (gmt 0)

Google gets a listing of all channels and topics on an IRC network. When it sees a URL in the topic of a channel, it enters the channel. Once it's in, it gets a list of all users.
It then performs analysis on this information to determine the importance of the link. If a channel has just one person in it, then it could easily be spam so it won't have much importance assigned to it.

The channel list will show the number of users within the channel, so joining the channel will not provide any additional information. Add to it that large channels already suffer from being larger than what IRC was originally designed for, a robot that joins every time it looks at the list will not likely be welcome unless it also promotes that channel in some way (and even then, some channels don't want any promotion).

Also I think that assuming a larger channel equates to having more important topics is a false assumption. Only the channel operators get to change the topics, so that limits the field of who gets access to topic changing, and that there are many mismanaged channels that have large numbers of users.

Interesting points. Let's say that Google takes more in to account than just users. After all they take lots of factors in to account when determining the importance of a web document.

How about they're looking at the ops/users ratio, amount of conversation going on in channel, number of spammers, the language being used in the channel.

The thing is that Google seems to be joining channels very briefly. What information can google get from breif visits? What can google do with this information.

Nova Reticulis




msg:109649
 3:45 pm on Nov 17, 2003 (gmt 0)

I have heard somewhere that what Google is planning on doing is real time monitoring of chat subjects on IRC so that people can quickly join a channel that interests them after doing a topic search on Google. Don't remember where exactly, I can look it up if anyone cares.

punta




msg:109650
 3:50 pm on Nov 17, 2003 (gmt 0)

Yeah, look it up. It'lkl be good to see.

However, it doesn't match with what people have been reporting. They have been saying that google has been entering channels. They don't need to do that to get the topic.

JasonHamilton




msg:109651
 4:07 pm on Nov 17, 2003 (gmt 0)

There is another company who is planning on monitoring channels so they can do targeted ads. They launched hundreds of robots onto freenode and proceeded to log all the channels on that network. They got banned shortly after that. I think they have gotten unbanned, but they were presenting themselves as being non-commercial, and GPLing their software.. however from everything I've been able to find out about them, they appear to be anything but.

I'd provide the URL, but I think thats against the ToS on this board, so if you want it, PM me.

mary




msg:109652
 5:27 pm on Nov 17, 2003 (gmt 0)

Nova, ChatScan did that two years ago. Google "chatscan banned", first listing, /.

Mozart




msg:109653
 2:06 am on Nov 21, 2003 (gmt 0)

Hmmm, this is an interesting topic. Sorry for being the opposite of technical, my IRC experience is rather limited, although I played with it off and on during the past 10 years. So I will give my impressions here without claiming its the ultimate truth or even hold up a little bit.

IRC (I fully agree here with Jason) is quite a bit more technical than Usenet. There is the logging on to different servers, joining channels (topics), listening or chatting, those DCC sends and receives etc.

There seems to be also a lot of conversations going on on many channels, much may be the style of "u r s00 h0t" or trivia games ("gimme the last letter").

Exactly this technical barrier and mass of communication (i.e. information flow) would mean to me - if I were Google - that the opening up were to give me everlasting fame. If I could do that somehow...

Hmmm... logging masses of information of a very temporary nature would chew up my resources. Logging interesting stuff would create a new resource as it now is not fleeting and temporary anymore. Google has bought a company specialising in natural language recognition recently, so wouldn't that help in analysing and separating the wheat from the chuff?

I have not read the article about googlebots being in IRC, so I don't know how long they stay in channels. But if I were Google I'd enter the channels and observe the amount of people in the channel, the ratio of operators to guests, invisibles etc., lurkers and active participants and most importantly I would try to analyse the conversation to find out what the real topic is. Not the advertised or channel topic, but what is really spoken about. This I would monitor not for all channels but only those passing a certain threshold of users and amount of conversation.

I then create a new tab on Google, perhaps titled Chat, where users can search by topic of interest and so give Ms Henderson a chance to find a channel of battered partners, which in the past had 32 users on average and therefore can help her right now.

Okay, here the perhaps silly question at the end for all those who read the whole long blurb: Does IRC actually have a URI scheme similar to Usenet? Something like irc://server5.ircworld.net/mozarts-channel-of-the-day or so? If not, maybe we need that!

Who else but me sees soon a possible P2P network logging? Hmmm....

Mozart

PS: So in short, I don't think Google would want to keep the IRC info for ever, only to analyse the topics, the "theme" of channels.

JasonHamilton




msg:109654
 2:27 am on Nov 21, 2003 (gmt 0)

Yes.

irc://server:port/channel

But you need a client like mIRC or Klient installed to make use of it.

panic




msg:109655
 2:56 am on Nov 21, 2003 (gmt 0)

Has ANYONE ever stopped to think that maybe someone used that hostname as a prank?

-p

punta




msg:109656
 9:51 am on Nov 21, 2003 (gmt 0)

Has ANYONE ever stopped to think that maybe someone used that hostname as a prank?

Obviously not you by the looks of it.

It's a valid google IP. It's more than likely that it was a google employee messing around. However, it's interesting to discuss these things in case they do turn out to be things that google are planning.

JasonHamilton




msg:109657
 3:38 pm on Nov 21, 2003 (gmt 0)

Not only a valid google IP, but there has been more than one source who has gotten replies from google validating that it is google and not users spoofing google on IRC.

When in doubt, ask for yourself.

panic




msg:109658
 5:27 pm on Nov 21, 2003 (gmt 0)

Has ANYONE ever stopped to think that maybe someone used that hostname as a prank?

It couldn't be that they spoofed a real IP, now could they? It just CAN'T be them spoofing a real IP, like they constantly do with Microsoft, now can it?

Of course not.

-panic

gg

JasonHamilton




msg:109659
 5:35 pm on Nov 21, 2003 (gmt 0)

It's relatively easy to spoof an IP address. What isn't so easy is to spoof a reply from google that validates that they are doing tests.

panic




msg:109660
 6:05 pm on Nov 21, 2003 (gmt 0)

A wise man once told me :

When searching for a homeless man...

Don't look in a house.

Putting that into practice answers a lot of suspicion/questions.

-p

punta




msg:109661
 9:23 am on Nov 24, 2003 (gmt 0)

Are you a friend of Eric Cantona by any chance :-)

Nova Reticulis




msg:109662
 4:43 pm on Nov 24, 2003 (gmt 0)

I have to withdraw my claim about Google because frankly I can't look up the page where I saw the discussion about Google doing exactly what ChatScan wanted to do.

P.S. "IP spoofing" is not something that is easy to achieve. Due to stateful and bidirectional nature of TCP it's next to damn impossible to meaningfully spoof. However, anyone with access to the DNS server adjacent to the IRC server or traffic between the DNS server and the IRC server can do fancy tricks with hostnames. I don't think that it is the case as the response from Google clearly admits they are doing it and they know what they're doing.

JasonHamilton




msg:109663
 4:49 pm on Nov 24, 2003 (gmt 0)

<<bidirectional nature of TCP>>

That is the key. I stand by my statement that it is easy to spoof an IP. However, if you wish to spoof an IP *and* read incoming packets sent to that IP, it's another matter entirely.

DNS poisoning really doesn't count though.

[edited by: JasonHamilton at 4:50 pm (utc) on Nov. 24, 2003]

punta




msg:109664
 4:50 pm on Nov 24, 2003 (gmt 0)

I agree with you Nova, connecting to IRC when your return packets are returning to someone else (In this case Google) would be nigh on impossible.

I think these guys are referring to something else. I believe that there are some vulnerablities in some versions of certain IRC servers that would allow someone to chose what IP address and hostname are displayed to other IRC users.

panic




msg:109665
 5:35 pm on Nov 24, 2003 (gmt 0)

if you wish to spoof an IP *and* read incoming packets sent to that IP, it's another matter entirely

What if packets weren't sent to that IP? What if no one /msg'ed nor /ctcp'ed them?

-p

punta




msg:109666
 9:37 am on Nov 25, 2003 (gmt 0)

What if packets weren't sent to that IP? What if no one /msg'ed nor /ctcp'ed them?

I haven't got a clue what you're babbling on about. As Jason said, it's incredibly unlikely that it was anybody other than Google as they have responded by email confirming that it was them.

panic




msg:109667
 5:23 pm on Nov 25, 2003 (gmt 0)

What if packets weren't sent to that IP? What if no one /msg'ed nor /ctcp'ed them?

When you /msg or /ctcp someone, you're essentially sending them ICMP packets. They then respond with another ICMP packet with the information you requested (for /ctcp). Or, if they /msg you back, it carries the contents of the /msg.

...

-panic

[edited by: ciml at 7:32 pm (utc) on Nov. 25, 2003]

Miraenda




msg:109668
 5:55 pm on Nov 25, 2003 (gmt 0)

Although I have rarely used IRC (due to going in to chatrooms and finding that I either couldn't quite grasp what was going on or that I didn't understand the lingo being used), I can easily see the benefit of a "Chat" section of Google devoted to finding the most relevant and most used IRC channels. If I were able to find channels on topics that interest me and were more likely to find good discussions, then I would certainly use such a tool. Since the numbers appear to be quite high for IRC usage, this seems only logic for Google to have as a consideration. I do not know what Google is doing, or if Google as an entity is intending to do anything with IRC. It could just be a Google employee looking around and experimenting. I think that these same discussions have been held over Usenet, Forums, and Blogs about why search engines would even consider crawling them, indexing them, what not. Obviously, the more information a search engine has, the more targeted and appropriate results can be made for the user.

JasonHamilton




msg:109669
 6:29 pm on Nov 25, 2003 (gmt 0)

When you /msg or /ctcp someone, you're essentially sending them ICMP packets. They then respond with another ICMP packet with the information you requested (for /ctcp). Or, if they /msg you back, it carries the contents of the /msg.

IRC has nothing to do with ICMP.

As an IRC user, anything you do on IRC deals only with the IRC server you are connected to. When you send a /msg, it goes to the server who then sends it either directly to that user (if the user is on your server), or to the hub/servers between your server and the target user's server, who then relay it to the intended user.

CTCP is simply a PRIVMSG with a control A in it.

...

Outside the scope of the discussion is stuff like DCC, but that initiates with a PRIVMSG (^ADCC SEND), sending the target user your longip:port info -- and is client side, not server side.

[edited by: ciml at 6:48 pm (utc) on Nov. 25, 2003]
[edit reason] please see sticky [/edit]

crankin




msg:109670
 8:50 pm on Nov 25, 2003 (gmt 0)

>>Obviously, the more information a search engine has, the more targeted and appropriate results can be made for the user.

Considering the fact that the vast majority of IRC 'conversation' has to do with cybersexing and warez, I fail to see how eavesdropping on that freakshow will improve search results.

berli




msg:109671
 6:17 pm on Nov 26, 2003 (gmt 0)

What about that fact that many urls offered on IRC are intended as private urls (such as private ftp servers)?

Will I have to make my friends log in for *every* private url I give them? (I figure giving them ftp logins is (necessary) inconvenience enough (since that gives them rw permissions). Having to log in for http is ridiculous--but not if my private url suddenly becomes public.)

This 56 message thread spans 2 pages: < < 56 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved