homepage Welcome to WebmasterWorld Guest from 54.197.94.241
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Crawling by Google
How google crawling algo works
hayden




msg:4663061
 7:25 am on Apr 15, 2014 (gmt 0)

Anyone please tell me how actually google crawls the various and thousands of websites worldwide. How the algorithm for crawling works. Does it crawls the websites category wise?
Because as we see the cache for any website it differs for all the websites. So my main question is how google crawls the websites and how the algorithm behind it works.

 

aristotle




msg:4663181
 3:45 pm on Apr 15, 2014 (gmt 0)

Googlebot tends to crawl some sites and some pages more often than others. Examples of reasons why a page would be crawled more often include:

-- the content of the page tends to change frequently

-- the page (or site) gets a lot of traffic

-- the page ranks near the top for a high-volume search term

Pages will be crawled less often if they don't get much traffic, don't change very often, and/or have a noindex metatag.

hayden




msg:4663390
 5:37 am on Apr 16, 2014 (gmt 0)

thanks for the answer aristotle. I got your answer. However My question was how all the websites are crawled. Whether they are categorized? Can you tell me how the algorithm works behind it?

lucy24




msg:4663395
 7:13 am on Apr 16, 2014 (gmt 0)

You want to know exactly how google works?

So do we all ;)

hayden




msg:4663430
 9:42 am on Apr 16, 2014 (gmt 0)

You are right lucy ;)

so do you know how the different websites are crawled?

keyplyr




msg:4663583
 5:01 pm on Apr 16, 2014 (gmt 0)

Hi hayden,

Google, and all bots, follow links from other sites on the web to discover your site. The more sites linking to your site, the more you'll get crawled.

If you want to control some aspects of this, then open an account at Google Webmaster Tools [google.com]

Another way of controlling when Google crawls your site is to create and manage a Site Map [sitemaps.org] which will give the Search Engines a clue when your web pages are changed so they can re-crawl.

The third thing I recommend is to read, read, read the endless threads here at Webmaster World.

lucy24




msg:4663599
 6:06 pm on Apr 16, 2014 (gmt 0)

If your domain is within ARIN (North America plus parts of the Caribbean) its existence is public record. Search engines will find and crawl it unless you expressly ask them not to.

If you are in other parts of the world, things may behave differently.

roshaoar




msg:4663829
 2:53 pm on Apr 17, 2014 (gmt 0)

I think there are several ways in which Google crawls sites. Ie Google builds up knowledge about your site and puts pages in different buckets. And each bucket has a different visit frequency:

a) - These pages change a lot so I'll crawl them often
b) - This is the homepage so I'll crawl it hourly
c) - Nobody links here so I'll look in once a month
d) - I saw this page linked from bigSite X so let's have a look
e) - According to sitemaps this page exists so let's have a look before I retire
f) - This was mentioned in the press, let's have a look today! Quick!
g) - goddam! the filesize has changed since last I looked so maybe I need to come back more often
h) - webmaster dude resubmitted page in GWT, I'd better go see
i) - Hey, I'm the google newsbot guy! I want a piece of the action too!
j) - I'm the baby pretend to be mobile guy. I do as I'm told by the big guy

And so on. So when you look through your weblogs, whilst the comings and goings of Google might look a bit random, I tend to think it's lots of different little lists in Google's brain all coming together.

hayden




msg:4664040
 7:03 am on Apr 18, 2014 (gmt 0)

Thanks keyplyr, lucy and roshaoar for your answers. I got the answer now and unerstood that how actually it works. As mentioned by roshaoar there are several factors which Google consider while crawling. so there is no one single factor.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved