Does a pop-up affect crawl & index speed?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Does a pop-up affect crawl & index speed?

newsnshop

8:34 am on Sep 18, 2012 (gmt 0)

Can a pop-up on any site slow down the crawling and indexing speed of bots.

tedster

1:07 pm on Sep 18, 2012 (gmt 0)

A pop-up has its own URL and that address will be spidered separately. So it won't affect the crawl speed for the page where it can be called.

We should think about a googlebot crawl differently from a conventional crawler. First there is URL discovery - just "what URLs exist". Those get put into a crawl list and then googlebot gets set to work through that list. So it's not like, on each crawl, googlebot is sort of sprawling out, downloading a page and then following every link on the page.

RegDCP

4:05 pm on Sep 18, 2012 (gmt 0)

We should think about a googlebot crawl differently from a conventional crawler. First there is URL discovery - just "what URLs exist". Those get put into a crawl list and then googlebot gets set to work through that list. So it's not like, on each crawl, googlebot is sort of sprawling out, downloading a page and then following every link on the page.

How did you come by this information tedster?

tedster

10:13 pm on Sep 19, 2012 (gmt 0)

By paying attention to articles and other information about Google's "Crawl Team". I do this both for my own SEO purposes and because I want to offer well-sourced information to this forum. Crawling has been done with some variant of this approach for many years, so I haven't quickly located a definitive source. But I will keep trying, since how crawling works is a relatively common discussion here.

You can also see how this approach almost needs to be the case by considering what it takes to crawl all the URLs on the whole web on a frequent basis - especially considering the wide variety of sources Google uses for URL discovery,

So as I see it, the crawl team builds a URL list and sends it to one of their googlebot servers to crawl. Then those pages are retrieved, examined for URL discovery, internal linking etc. The next crawl list can then be built based on the new data. This approach would certainly be faster in computation resources than trying to decide on the next URL to request in real time.

However, if a brand new URL is discovered on a re-crawl of a known page - then there may well be a special routine that kicks in and gets that new URL crawled ASAP. It's just that every crawl of every existing page wouldn't even need to get an immediate new request, the way crawling seemed to work in 1990s.