joined:Sept 10, 2018
Thank you @Pontificus Maximux, that's very useful information. I'm going to re-run my tests again and make sure to exclude youtube videos and the like.
But if you feel that adding one more occurrence of "Red Socks" is going to make the difference then go ahead, and good luck.
Perhaps I could have been clearer, I'm largely removing keywords due to a suspected Penguin penalty. I have a lot of natural exact match backlinks due to the way people share my site. But TF-IDF *is* a ranking factor.
I don't believe a word Google says about their algorithm anymore. They put out misinformation all the time.
I'm going to indulge in a bit of carefully thought out speculation here.
Google is stupid and can't tell the difference between legitimate sites and spam sites (that's why they have manual actions). Penguin assesses the trustworthiness of each of your links individually, based on a confluence of factors, including the page they came from, your current on-page keyword percentage, and other trust signals. The people at Google don't care about collateral damage. Matt Cutts was probably the only person who acted as a voice for webmasters.
A lot of long established sites have been on a long slow decline since late 2016 when Penguin 4.0 was introduced, or early 2017, which is when they seem to have added more quality signals into Penguin.
I am convinced that Penguin is affecting a large percentage of sites these days. Google will deign to give you the power from your backlinks if they think you deserve it, and this year that means only if you look like a corporation. If your site is in decline and you can't 100% explain the decline via snippets or other factors, you should look into this.
I'm also convinced that Penguin has an active demoting factor as well as just an ignoring your links factor. Why have some fresh thin content sites started ranking well? Because they got lucky with a few on page trust signals and keyword percentages, and they don't have any backlinks, so they can't get demoted by Penguin. Why have thin content pages from corporations started ranking well? Because corporate signals prevent Penguin penalties.
I also suspect they won't rank you properly in the neural matching / super synonyms algorithm if your percentage of Penguined links is too high, no matter how relevant your page is, which is probably part of the reason why the algorithm doesn't work properly.
A lot of people here seem to have been struggling since March.
We know they did something to Penguin in March, because that's when exact match domains came out of nowhere and teleported in to the top of niches and even got rewarded with site links. Maybe they took out some legacy code by accident, or maybe they made Penguin even more aggressive but they have the problem that they can't properly distinguish between EMDs and brands.
Remember the early versions of Penguin penalised a bunch of brand sites and individuals with a high percentage of exact match backlinks? They have trouble telling the two apart. That's why they keep saying to focus on building your brand. They recognise your website as an entity and they have to give exact match links pointing at your website a free pass so they don't penalise companies like Amazon by mistake. So because they can't properly tell EMDs from brands, the EMDs are getting a free ride, while everyone else is getting various percentages of their backlinks penalised. That's why brands and EMDs are doing so well this year.
Why is it taking so long to get new sites indexed at the moment? Why so many apparently link-based updates recently? Because they know Penguin is messed up and they're trying to fix it, and every time they tweak the algorithm they have to recrawl the entire web and reassess everyone's links against their target pages one at a time. The server load must be insane.