homepage Welcome to WebmasterWorld Guest from 54.196.159.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Feedfetcher
aristotle




msg:4687045
 8:40 pm on Jul 11, 2014 (gmt 0)

Today I noticed Google Feedfetcher getting a page from one of my sites:
Host: 66.249.90.63
/Page.html
Http Code: 200 Date: Jul 11 15:09:39 Http Version: HTTP/1.0 Size in Bytes: 19650
Referer: -
Agent: Mozilla/5.0 (compatible) Feedfetcher-Google;(+http://www.google.com/feedfetcher.html)

I'm a bit puzzled because I've never had any kind of feed on any of my sites. So I'm wondering why Google feedfetcher would want this page. Does anyone have an explanation?

Note: This is a static html page that hasn't been touched in years. It has two images, but apparently they weren't fetched.

 

aristotle




msg:4687397
 6:40 pm on Jul 13, 2014 (gmt 0)

Well this Google feedfetcher is still showing up, about every 8-10 hours, always getting the same page.

So now I'm wondering if someone could have created a feed that includes my page. Is that possible? If so, why would anyone do it?

dstiles




msg:4687399
 6:56 pm on Jul 13, 2014 (gmt 0)

Not sure but I THINK feedfetcher is triggered by a human who wants to keep tabs on your page(s). I get a few such hits but because the bot shows up on multiple-function IPs with an ambiguous rDNS (in this case a proxy) I usually block the bot.

I suppose "proxy" is another way of representing this but it is G. :(

not2easy




msg:4687401
 7:07 pm on Jul 13, 2014 (gmt 0)

I have had problems seeing referer-spam via this google range, but I see that many of their tools like page-speed insights and javascript optimization also use that proxy. I'm watching it to decide whether it is worse to block it or allow it. I have

Host google-proxy-66-249-80-232.google.com
NetRange: 66.249.64.0 - 66.249.95.255
CIDR: 66.249.64.0/19

for it - but not a clear idea of who/what uses it.

wilderness




msg:4687406
 7:37 pm on Jul 13, 2014 (gmt 0)

There's a similar recent thread [webmasterworld.com]

aristotle




msg:4687527
 1:08 pm on Jul 14, 2014 (gmt 0)

Since we're talking about Google, I would like to ask about another recent log entry that puzzles me:
Host: 66.249.65.156
/
Http Code: 200 Date: Jul 14 02:20:00 Http Version: HTTP/1.1 Size in Bytes: 44818
Referer: http://example.com/Page.html
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
. . . . .
IP: 66.249.65.156
Hostname: crawl-66-249-65-156.googlebot.com
ISP: Googlebot
Organization: Googlebot
Services: None detected
Type: Corporate
Assignment: Static IP
Country: United States
State/Region: California
City: Mountain View

The referer (elmi.aliexirs.ir) appears to be an Iranian website with a directory structure filled with scraped copies of pages from other websites.

What I'm thinking is that this could be referer spam using a fake googlebot agent, but the IP puzzles me. Can anyone elucidate?

wilderness




msg:4687537
 1:54 pm on Jul 14, 2014 (gmt 0)

There was some mention (a while back) by somebody, whom said that google was going to begin showing some refers on crawled pages.

dstiles




msg:4687662
 7:21 pm on Jul 14, 2014 (gmt 0)

Could it be a genuine googlebot running under a "test as googlebot" service? IE a true google service but run under external control.

If this IS the case it's a rather terrifying loophole.

If it's merely G adding an arbitrary referer then G has some serious answers to make to some serious questions!

keyplyr




msg:4687727
 1:25 am on Jul 15, 2014 (gmt 0)

I've seen many referrers in legit Googlebot requests. Why Googlebot includes referrers sometime is a mystery. In the above situation, I tend to think since this is a valid Googlebot IP, then the UA is authentic. It IS Googlebot and you've luckily been informed that a website has scraped your content (and stupidly left the links.) Now the next step is to figure out what you're going to do about it, given the place of origin.

aristotle




msg:4688036
 11:51 pm on Jul 15, 2014 (gmt 0)

But if this is a genuine googlebot visit, that raises the question of why it provided a referer in this case but rarely does so in the vast majority of cases. My impression is that googlebot doesn't need a referer, and normally comes on its own, that being the reason that the logs of its visits normally don't show a referer. Yes I know that it supposedly follows links to find new pages, but after it finds a page, it can come on its own. So why did it show a referer in this case?

keyplyr




msg:4688038
 12:02 am on Jul 16, 2014 (gmt 0)

Googlebot DOES follow links from other sites. One indication of this is the plethora of incoming broken links (404s) reported at GWT. Link following is also a factor in determining Page Rank. So we know that yes, Gogle bot does crawl organically as well as by incoming following links foiund on remote web sites.

As I said, why Googlebot occasionally includes the referring link sometimes is a mystery. Could be by (yet to be determined) design, or a complete fluke. Don't think anyone really knows. Again, I see it a few times each week at a few sites I manage.

wilderness




msg:4688041
 12:10 am on Jul 16, 2014 (gmt 0)

aristotle,
I wouldn't be too concerned unless it happens repeatedly (as keyplr suggested).

I post widget reference links in widget forums and google (and others) pick them up pretty fast and request the page.

aristotle




msg:4688044
 12:31 am on Jul 16, 2014 (gmt 0)

Thanks for the replies. But I'm really not concerned about any of this, and only brought it up because I wasn't sure if it was a fake googlebot or referer spam or what. And if it's genuine, that doesn't bother me either since the pages on this site have already been scraped numerous times, so that once more won't make any difference.

lucy24




msg:4688058
 1:49 am on Jul 16, 2014 (gmt 0)

google was going to begin showing some refers on crawled pages

Yikes. Do you mean, narrowly and specifically, pages? They often give referers for non-page requests-- lately most often with stylesheets-- but I've never seen them send a referer with a page request.

:: detour to check, thank you very much TextWrangler ::

Nope. Never.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved