Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Surge in Redirected requests.... Google finds an old shopping list

         

lucy24

10:37 pm on Mar 20, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Has anyone else noticed unusual Googlebot behavior recently?

At the very end of 2013 I moved most of my material to a brand-new site. In addition, I've got assorted redirects dating back to 2011 and 2012--updated for the new site, so everything gets done in a single step. Like any search engine, Google continues to spot-check periodically, requesting a few random files each day. Then yesterday...

Whew. Redirected requests went through the ceiling. Easily four times as many as usual for the time period. So many that the day's raw HTTP log file is 2-3 times bigger than usual. (The site itself is now HTTPS, so nothing but 403s and 301s get logged on the HTTP side. Another rough but useful metric: The text file in which I keep a record of the past 30 days' redirects leaped up about 30% in size thanks to this one day's activity. A still rougher but perhaps still more useful metric: Google's redirects outnumbered Bing's. Stop The Presses.)

-- It wasn't an overall spike in requests; the directories that stayed behind were unaffected.

-- Some of the requested URLs were very, very old. But only if they were redirected to the new site; there was no upsurge in 410s.

-- Of particular interest was the huge number of duplicate directory requests: not just multiple
/directory/
but plenty of both
/directory
and
/directory/index.html

Naturally I'm wondering. Does Google randomly cycle through websites, pulling up old redirects every few years when it gets to be your turn? Or are they working on some new algorithm?

Robert Charlton

5:07 am on Mar 21, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



lucy24, I've lengthened your title for clarity, adding "Surge in Redirected requests" to your ingenious and amusing title: "Google finds an old shopping list".

Your shopping list comment is a metaphor for how Google sometimes crawls old data, but search engines don't do well with metaphors (and often people don't either). I'm thinking that your question wasn't about the usual 404 error question, but it was rather specifically about Redirected requests and Google crawling, looking back over a good many years. Thus, the addition.

In the past I've felt that such massive Google reviews of old legacy data were a sign that Google was doing something for which it eventually wanted a "clean" url list, like perhaps an update, and we've discussed that in two threads I'm citing below...

Massive jumps in GSC legacy crawl errors - who sees this?
9/8/2016
https://www.webmasterworld.com/google/4817870.htm [webmasterworld.com]

17 May 2013 - GWT Sudden Surge in Crawl Errors for Pages Removed 2 Years Ago?
https://www.webmasterworld.com/google/4575982.htm [webmasterworld.com]

Note that during the progress of the 2016 thread, Simon_H, who'd started it, joined the Sep 9, 2016 Google Hangout to ask John Mueller about the crawl errors we were discussing, and the exchange was reported that day by Barry on SERoundtable...

Google: The Increase In Crawl Errors Are Nothing To Worry About; Just Hungry Crawlers
Sep 9, 2016 - by Barry Schwartz
https://www.seroundtable.com/google-increase-in-crawl-errors-22669.html [seroundtable.com]

To quote Barry at greater length than I normally would, Barry's article cites our thread about which Simon was asking...
Over the past week or so, some webmasters have been reporting an increase in crawl errors. It is mostly documented in this WebmasterWorld thread.

Some suspect it has to do with an algorithm update - but we've covered not just once but twice that these crawl changes in Google Search Console have no relation to upcoming algorithm updates - at least that is what Google told us.

John Mueller was asked about this by Simon in the Google Hangout.... John essentially said that he looked into some of these reports and yes, it is unrelated to any algorithm thing....

Simon later that day posted in our thread...
I'm not convinced this is just coincidence. From a statistical point of view, having multiple sites flagging massive jumps in crawl errors around the same time (we're actually receiving warning emails from GSC it's so severe) is unlikely to be pure coincidence, plus the last time people saw this happen, Penguin 2 hit a week later.
He posted a comment to the same effect on SER.

I should add that, following the discussions above, on Sept 13, 2016, that MOZcast reported the largest changes it had ever recorded up to that date; and Penguin 4.0 was announced on Sept 23, and rolled out slowly over subsequent weeks. In retrospect, there appears to have been a degree of nested updates. Here, for reference, is the WebmasterWorld thread on Penguin 4.0....

Penguin: Core, realtime and updated today
Sept 23, 2016
[webmasterworld.com...]

So, back to the OP here...
Does Google randomly cycle through websites, pulling up old redirects every few years when it gets to be your turn? Or are they working on some new algorithm?

I'd vote for the algorithm, but I'm sure Google won't confirm that, and it may also be that Google cycles through websites, but I don't know about how "random" the datasets are. It may also be that on some of the cycles John looked at, there was no update pending, or at least no update pending which required a clean list or index, and therefore no old data (aka shopping list) was being used. Clear? ;)

It's interesting to me that you're seeing Redirected request errors instead of 404 errors, suggesting that Google might be looking at yet a different kind of update, albeit I'm not sure what that might be.

I'm thinking there might be further clues in some of your observations, which suggest perhaps that some issues might be getting checked out which could involve an update which is recursive and which involves a different area of the index, as in one that involves prior redirects. (Note that this is conjecture, and that I'm not a Google search engineer.)

not2easy

7:31 am on Mar 21, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



These are not redirected pages in my case, but many old long gone pages. They were looking at some (330) really old 404s on one of my sites, the site changes daily and since I was in there dealing with getting them to update the sitemaps they use, I happened to look at robots.txt. The version shown in the (old) GSC was from mid February. I'm certain they've requested it since then but I guess they liked the older version. In the new GSC I had resubmitted the robots.txt a few weeks ago because they want images that they claim are on the sitemap though I can't see them there. I really dislike the feeling of wasting my time in GSC, but they don't seem to update to new sitemaps without prodding. And it looks like the old and new versions of the GSC aren't on speaking terms.

keyplyr

7:44 am on Mar 21, 2018 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What you're seeing in your logs is likely just collateral damage from the reindexing of the ongoing Mobile-First Index [webmasterworld.com] update.

I've seen escalated Googlebot activity for the last few weeks. On one site Googlebot requested every archaic link.

You have redirects. Googlebot is evaluating them for the new algo. If you can get by without them, I would recommend removing (or reducing) redirects as they seem to be a factor.

Robert Charlton

10:15 pm on Mar 26, 2018 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It appears to have been preparation for the mobile first indexing....

Google rolling out Mobile First index
Monday, March 26, 2018
https://www.webmasterworld.com/google/4893387.htm [webmasterworld.com]

Today we’re announcing that after a year and a half of careful experimentation and testing, we’ve started migrating sites that follow the best practices for mobile-first indexing.....