Welcome to WebmasterWorld Guest from 54.145.209.34

Forum Moderators: Ocean10000 & incrediBILL

Crawling by Google

How google crawling algo works

   
7:25 am on Apr 15, 2014 (gmt 0)



Anyone please tell me how actually google crawls the various and thousands of websites worldwide. How the algorithm for crawling works. Does it crawls the websites category wise?
Because as we see the cache for any website it differs for all the websites. So my main question is how google crawls the websites and how the algorithm behind it works.
3:45 pm on Apr 15, 2014 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Googlebot tends to crawl some sites and some pages more often than others. Examples of reasons why a page would be crawled more often include:

-- the content of the page tends to change frequently

-- the page (or site) gets a lot of traffic

-- the page ranks near the top for a high-volume search term

Pages will be crawled less often if they don't get much traffic, don't change very often, and/or have a noindex metatag.
5:37 am on Apr 16, 2014 (gmt 0)



thanks for the answer aristotle. I got your answer. However My question was how all the websites are crawled. Whether they are categorized? Can you tell me how the algorithm works behind it?
7:13 am on Apr 16, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You want to know exactly how google works?

So do we all ;)
9:42 am on Apr 16, 2014 (gmt 0)



You are right lucy ;)

so do you know how the different websites are crawled?
5:01 pm on Apr 16, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hi hayden,

Google, and all bots, follow links from other sites on the web to discover your site. The more sites linking to your site, the more you'll get crawled.

If you want to control some aspects of this, then open an account at Google Webmaster Tools [google.com]

Another way of controlling when Google crawls your site is to create and manage a Site Map [sitemaps.org] which will give the Search Engines a clue when your web pages are changed so they can re-crawl.

The third thing I recommend is to read, read, read the endless threads here at Webmaster World.
6:06 pm on Apr 16, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



If your domain is within ARIN (North America plus parts of the Caribbean) its existence is public record. Search engines will find and crawl it unless you expressly ask them not to.

If you are in other parts of the world, things may behave differently.
2:53 pm on Apr 17, 2014 (gmt 0)



I think there are several ways in which Google crawls sites. Ie Google builds up knowledge about your site and puts pages in different buckets. And each bucket has a different visit frequency:

a) - These pages change a lot so I'll crawl them often
b) - This is the homepage so I'll crawl it hourly
c) - Nobody links here so I'll look in once a month
d) - I saw this page linked from bigSite X so let's have a look
e) - According to sitemaps this page exists so let's have a look before I retire
f) - This was mentioned in the press, let's have a look today! Quick!
g) - goddam! the filesize has changed since last I looked so maybe I need to come back more often
h) - webmaster dude resubmitted page in GWT, I'd better go see
i) - Hey, I'm the google newsbot guy! I want a piece of the action too!
j) - I'm the baby pretend to be mobile guy. I do as I'm told by the big guy

And so on. So when you look through your weblogs, whilst the comings and goings of Google might look a bit random, I tend to think it's lots of different little lists in Google's brain all coming together.
7:03 am on Apr 18, 2014 (gmt 0)



Thanks keyplyr, lucy and roshaoar for your answers. I got the answer now and unerstood that how actually it works. As mentioned by roshaoar there are several factors which Google consider while crawling. so there is no one single factor.
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month