Welcome to WebmasterWorld Guest from 54.198.221.13

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

Spiders and referral information

     
1:32 pm on Dec 14, 2007 (gmt 0)

New User

5+ Year Member

joined:Nov 24, 2007
posts: 9
votes: 0


In one of the old threads (from 2004) [webmasterworld.com], I found a quote for jdMorgan saying:

That only works if there is one and only one link to your page. Otherwise, they'd have to re-fetch your page every time they found a link to it in order to "give you a chance" to reject each and every incoming link referrer...

That's why spiders don't do this. They work from a database that may contain dozens to tens of thousands of link referrers to your one page. How would they know which one you won't like without trying all of them? :(

This is an interesting point, however I am not sure i understand completely the logic behind it. Wouldn't it actually be beneficial to have the referrer info from spider visits? Googlebot will fetch my page every time it finds a link to my site (will it?). It will not index or cache it each time but it will visit. What would be a logical problem with spiders passing referral information?

1:39 pm on Dec 14, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Googlebot will fetch my page every time it finds a link to my site (will it?)

No, it almost certainly will not re-refetch your page every time... What if your site was CNN or Amazon.com, with hundreds of thousands of incoming links to your homepage? It would make no sense to re-fetch the homepage every time one of those links (many of them stale) was found.

Google and the others are not going to want to pay for that wasted bandwidth either, and you can be sure they de-duplicate their URL lists to save bandwidth, money, and time.

Jim

1:51 pm on Dec 14, 2007 (gmt 0)

New User

5+ Year Member

joined:Nov 24, 2007
posts: 9
votes: 0


so how do they decide which links are they going to use to visit my site? and where are daily 20-30 (or 200-300) Googlebot visits to my site coming from? every time from the same link?
2:30 am on Dec 21, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The Google system has many parts.

One part parses previously cached pages and adds newly discovered URLs found in those pages to the list of URLs to fetch.

Another part of the system uses that list to fetch the content of those URLs.

The system is NOT like a browser, directly following links from page to page and site to site.

The system fetches the pages working from a pre-compiled list of pages to collect.

They are not interested in "using particular links to visit a site", they are interested in fetching the content from as many URLs as possible.

3:13 am on Dec 21, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


But to add to that to address one aspect of the question, Google uses thousands of hosts in possibly hundreds of datacenters for spidering, and these are not always in perfect synchronization; Therefore you will indeed see multiple fetches, even though these machines may all be working from the same de-duplicated list of URLs. However, you will usually not see fetches for the same page from the same (or similar) IP address, unless a crawl is restarted due to a problem.

This "many-servers" aspect is also the reason why you may often get different search results from hour to hour or even from minute to minute; Google uses round-robin load-balancing DNS, so your first search may be handled in Chicago, and your second search by a server in San Diego. If these machines are not working from the same index, you may see different search results -- different pages listed in different order.

Jim

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members