homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Google Feedfetcher

 8:40 pm on Jul 11, 2014 (gmt 0)

Today I noticed Google Feedfetcher getting a page from one of my sites:
Http Code: 200 Date: Jul 11 15:09:39 Http Version: HTTP/1.0 Size in Bytes: 19650
Referer: -
Agent: Mozilla/5.0 (compatible) Feedfetcher-Google;(+http://www.google.com/feedfetcher.html)

I'm a bit puzzled because I've never had any kind of feed on any of my sites. So I'm wondering why Google feedfetcher would want this page. Does anyone have an explanation?

Note: This is a static html page that hasn't been touched in years. It has two images, but apparently they weren't fetched.



 6:40 pm on Jul 13, 2014 (gmt 0)

Well this Google feedfetcher is still showing up, about every 8-10 hours, always getting the same page.

So now I'm wondering if someone could have created a feed that includes my page. Is that possible? If so, why would anyone do it?


 6:56 pm on Jul 13, 2014 (gmt 0)

Not sure but I THINK feedfetcher is triggered by a human who wants to keep tabs on your page(s). I get a few such hits but because the bot shows up on multiple-function IPs with an ambiguous rDNS (in this case a proxy) I usually block the bot.

I suppose "proxy" is another way of representing this but it is G. :(


 7:07 pm on Jul 13, 2014 (gmt 0)

I have had problems seeing referer-spam via this google range, but I see that many of their tools like page-speed insights and javascript optimization also use that proxy. I'm watching it to decide whether it is worse to block it or allow it. I have

Host google-proxy-66-249-80-232.google.com
NetRange: -

for it - but not a clear idea of who/what uses it.


 7:37 pm on Jul 13, 2014 (gmt 0)

There's a similar recent thread [webmasterworld.com]


 1:08 pm on Jul 14, 2014 (gmt 0)

Since we're talking about Google, I would like to ask about another recent log entry that puzzles me:
Http Code: 200 Date: Jul 14 02:20:00 Http Version: HTTP/1.1 Size in Bytes: 44818
Referer: http://example.com/Page.html
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
. . . . .
Hostname: crawl-66-249-65-156.googlebot.com
ISP: Googlebot
Organization: Googlebot
Services: None detected
Type: Corporate
Assignment: Static IP
Country: United States
State/Region: California
City: Mountain View

The referer (elmi.aliexirs.ir) appears to be an Iranian website with a directory structure filled with scraped copies of pages from other websites.

What I'm thinking is that this could be referer spam using a fake googlebot agent, but the IP puzzles me. Can anyone elucidate?


 1:54 pm on Jul 14, 2014 (gmt 0)

There was some mention (a while back) by somebody, whom said that google was going to begin showing some refers on crawled pages.


 7:21 pm on Jul 14, 2014 (gmt 0)

Could it be a genuine googlebot running under a "test as googlebot" service? IE a true google service but run under external control.

If this IS the case it's a rather terrifying loophole.

If it's merely G adding an arbitrary referer then G has some serious answers to make to some serious questions!


 1:25 am on Jul 15, 2014 (gmt 0)

I've seen many referrers in legit Googlebot requests. Why Googlebot includes referrers sometime is a mystery. In the above situation, I tend to think since this is a valid Googlebot IP, then the UA is authentic. It IS Googlebot and you've luckily been informed that a website has scraped your content (and stupidly left the links.) Now the next step is to figure out what you're going to do about it, given the place of origin.


 11:51 pm on Jul 15, 2014 (gmt 0)

But if this is a genuine googlebot visit, that raises the question of why it provided a referer in this case but rarely does so in the vast majority of cases. My impression is that googlebot doesn't need a referer, and normally comes on its own, that being the reason that the logs of its visits normally don't show a referer. Yes I know that it supposedly follows links to find new pages, but after it finds a page, it can come on its own. So why did it show a referer in this case?


 12:02 am on Jul 16, 2014 (gmt 0)

Googlebot DOES follow links from other sites. One indication of this is the plethora of incoming broken links (404s) reported at GWT. Link following is also a factor in determining Page Rank. So we know that yes, Gogle bot does crawl organically as well as by incoming following links foiund on remote web sites.

As I said, why Googlebot occasionally includes the referring link sometimes is a mystery. Could be by (yet to be determined) design, or a complete fluke. Don't think anyone really knows. Again, I see it a few times each week at a few sites I manage.


 12:10 am on Jul 16, 2014 (gmt 0)

I wouldn't be too concerned unless it happens repeatedly (as keyplr suggested).

I post widget reference links in widget forums and google (and others) pick them up pretty fast and request the page.


 12:31 am on Jul 16, 2014 (gmt 0)

Thanks for the replies. But I'm really not concerned about any of this, and only brought it up because I wasn't sure if it was a fake googlebot or referer spam or what. And if it's genuine, that doesn't bother me either since the pages on this site have already been scraped numerous times, so that once more won't make any difference.


 1:49 am on Jul 16, 2014 (gmt 0)

google was going to begin showing some refers on crawled pages

Yikes. Do you mean, narrowly and specifically, pages? They often give referers for non-page requests-- lately most often with stylesheets-- but I've never seen them send a referer with a page request.

:: detour to check, thank you very much TextWrangler ::

Nope. Never.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved