|Understanding Data Refreshes|
What do you know about them?
I have been trying to better understand data refreshes as they seem to be the things that affect me more than anything else. I used to believe these were updates, but after reading multiple threads and so on I believe these are separate from a traditional update. I understand what they are, but do not understand why they have such drastic affects. I was hoping that some of the more experienced posters and also the new posters could have one place to discuss how a data refresh affected them. I think that if we all got together on this, we may be able to have a better understanding of them and have more insight as to what it does.
I am more than happy to share my experiences over the past several months regarding them.
For me, I seem to notice these data refreshes close to the end of the month or at the beginning of the month. There is not exact in this, but for me, that is when it happens. It is not every month, it seems to happen at the beginning or end of every business quarter. Then another data refresh comes a long about one month after that one and stays that way till the end of that business quarter (which is VERY soon). I am not saying that this has anything to do with Google business operations as I really donít think they would do that, but for me, thatís when it happens.
When this does happen, one of 2 things occur.
1)My very well aged (5+ years) web sites will fall to the ranking of 9999 or more for all but 4 keywords. Those 4 pages that still rank, are very confusing to me and I have never been able to understand why they remain while the others drop. This started to happen during the first installment of the new infrastructure of ďBig DaddyĒ and continues to date. When Big Daddy rolled out, I took the advice Matt Cutts suggested about 301 redirects. I have them in place. What is interesting in my case is that on one instance I have 3 domains and 2 of those domains are 301ís to the another domain.
www.abcdefg.com (this is the primary domain with a 301 for www only)
www.abcdefg.net (this domain has a 301 to the primary domain www)
www.abcdefg.org (this domain has a 301 to the primary domain www)
I feel as if I can not change this because these 3 domains used to be mapped and over 5 years have received hundreds and hundreds of good crusty old links for their given domain name. With this being said, what I notice when these data refreshes occur on the negative side is that 4 pages that still rank well in Google, a normal keyword search will show the URL as www.abcdefg.com/pagethatstays.html which is what it is supposed to be. However, lets say I want to experiment with the site operator and run that page url for www.abcdefg.org/pagethatstays.html its showing up as that url, when that 301 is and has been in place for a while now. This happens only to 4 pages. And they are never affected by any data refreshes at all. I have looked over those 4 pages many times and do not see any difference in comparison to the other pages that do drop. Any ideas what to look for? I have run through these over and over again and still donít see it.
2)The second thing that happens during these data refreshes is that the pages all recover and hold what I would like to believe are the appropriate
rankings. Lets say that out of 100 keywords targeted, 30% return back to the top 3 in google and another 20% return to page one, and the rest are page
2 and more. The above mentioned 301 issue is resolved and non issue, every page (including the 4 pages that always remain show the correct url in the
site operator. I try to see what is different when this happens and outside of that site operator command, nothing is different. This will last until the next data
big data refresh and then return back to the first example I have provided. Why? I have no idea, but I know it happens.
I have daily historical keyword rankings on this for the past year and this is a reoccurring issue. Obviously I can not share that data as I would not feel comfortable with that, but I do have this data that shows me with good rankings, then for a while rankings that are below 9999 (thatís as low as DP will go). Every time that this occurs (good or bad) I start to look in to it on the boards/blogs, and every time I do, I hear it is a data refresh. So it makes me conclude that the data refreshes are the reason that it happens to me.
I do understand that I am affected by data refreshes, but I do not blame everything on the data refreshes. I just notice this. I am certain that I need to be looking for something and I just have no clue what so ever what that could be. So I post the following questions to anyone who is interested in trying to better understand data refreshes;
1)What has been your experience and understanding of a data refresh?
2)Have you seen similar things happening to your web site?
3)Have you recovered in a similar fashion?
4)What kind of other things have you noticed when this happens?
5)If this used to happen and no longer is an issue, what did you do?
6)Any other ideas?
I would really appreciate if people who have had this happen to them try and provide some more details, so the people that have been affected in this community can get a better understanding why this happens. I also think it is relevant to share that MSN and Yahoo have been giving me consistent SERP rankings overall for the past year if that makes a difference or not.
Thanks for reading and I look forward to the responses.
I guess it is also important to know that I have been positively affected since June 27th.
One thing we have in common is a site of mine is like a rollercoaster in google.
Up and down as you describe. Except a certain amount of pages /kw's never budge in the serps. They remain and hold their positions. They don't even nudge up and down a few places.
This site hit the skids last sept and even during that period a very small number of pages remained in the serps.
site is 5+ years, 1200 pages.
sorry i can't offer you more as i don't bother what google does with it, i just know when i get a spike in my stats that google has been turning knobs again.
if you want to ask any particular questions i'll certainly look for you.
The recent changes don't seem to be so much data refreshes as a reversion to older site info, especially the title.
I'm not sure about everyone elses niche, but in my world, I am seeing all my competitors pages reverting to a site title that is several years old. If you view the source of the sites that still rank, I'd bet the meta title tag does not match what Google is currently displaying.
I am seeing the title of my PR 5 site reverting back to what google initially picked up when the site was first indexed 6 years ago, which is simple my non-descriptive company name. Those were the days when SEO was not as big an issue as it is today. The end result is that my index page has lost all it's keywords and has dropped.
Any sub pages that I recently created and were picked up by Google have not changed or have changed very little in their position.
At least for the present, it seems Google has forced a reversal of all the tweaking we have done over the years.
It's like making us stick by the first thing that came out of our mouth and any changes from that are considered an attempt to improve our ranking.
This is creating some pretty poor search results. However, that may be the point. The new filter seems like the "whack a mole" game. They are allowing the newer sites to get a peak, then they may whack (filter) the trash sites and then combine the results with the previous "established" sites. Of course this is all merely my hallucination.
In the end, I guess the way to avoid this is to plan you site pages properly in advance and once they're picked up by Google, don't change them (or the keywords they conatin) drastically.
Purely speculation but I think this is how they implement penalties/filters. I had a site in this status for quite some time. The only ranking would be for exact searches for text that was on original page coming soon... under contruction type page. The site would rank number one for useless content. In fact the cache matched perfectly with wayback machine first cache. I struggled with no success and finally bit the bullet banned googlebot & used the google URL removal tool. Six months later I came back with better rankings than ever. It was a painful decision but I have no regrets since I'm not so sure I had any actual net loss of traffic. It corrected numerous problems including supplemental pages etc. I now zap supplementals this way regularly. In the scheme of things 6 months is not that long considering many problems can persist for much longer periods.
"Purely speculation but I think this is how they implement penalties/filters."
I think so too. I guess "Data Refresh" is the deploy of per-page or per-site penalties on monthly basis (at present).
Exactly the method I was going to suggest, but it was so painful to even write, well, thanks for saying it.
I guess I could try registering a new domain and placuing the old content in it, but unless the old domain is zapped first, I'll be penalized right out of the box for duplicate content.
I dominated the search for 6 years, 1500+ unique visitor per day, nearly 2% daily sales conversion, BBB Online reliability member,now my traffic logs and sales figures are barren. Given up to scraper sites, directories and non-relevant search results. For those of you thinking "boo hoo"...stick it in your ear.
I plan on shutting everyting down and doing the removal tool thing.
Hell of a way to have to run a business, thanks Google!
Sorry, but I'm thoroughly disgusted.
Sorry I'm back so soon...
I may have found a simpler answer.
I went and checked DMOZ, and they seem to the source of my problems.
Google occasionally is using snippets from the ODP, so a simple
<META NAME="ROBOTS" CONTENT="NOODP"> may cure it...I'll let you know.
I see Matt's Blog uses the same, so there must be something to it...
I feel your pain. We all push the envelope trying to make a buck. In a perfect world we could just email google and get a reply something like.... spammy-page-xyz.html seems to violate quality guideline abc. Please make appropriate changes and we will reconsider it's status. If they really and truly just want quality compliance I for one would say ahhh busted again I better fix that. Wouldn't the end result be a better internet for users? OK back to reality it just doesn't work that way. Too much to ask I suppose... Too many gray areas open to interpretation. One sites successful promotion is another sites spam. So we are stuck here trying to figure out the cause based on the results... Remove the cause and wait for the results to change... and wait and wait and wait some more.
<<Understanding Data Refreshes
What do you know about them?>>
Good question 300M.
Googleguy wrote this week on this forum about the Data refreshes:
<<There was a data refresh on June 27th that lots of people ask about, but there was also a data refresh in the last 1-2 days that refreshes the same data. Going forward, I'd expect that the cycle time would go down even more, possibly down to once a week for that particular algorithm. But people also asked about data refreshes back in September of last year.>>
What is important in the post of GG is:
- the 22 sept. filter was also a Data refresh.
- Data refreshes are directed towards a particular algorithm.
- There are also data refreshes to other algoritms.
- We will get this refreshes every week in future so this week lost might be next week back?
One of my sites suffered the data refresh in september (we called it a filter those days) and it came back after the data refresh of 27 of june.
There were a few hypothesis about what the filter (algoritm) of 22 sept. was triggering. It seemed to be dupl. content (could be a small text on several pages) and to much anchor texts.
Characterics of sites in problems after the september Data refresh (but I think also after the june refresh):
-suppl. above homepage in the SERPS. My site had this till it recovered in june. Now suppl. are where they belong, down.
- the particular filter seemed to be directed to money terms (hotels, real estate etc.).
- a few pages seemed to survive the filter but nobody knows why. Those pages often have not to much text on them as if the filter needs a certain amount of text to work properly.
Is it possible the algoritm of june is the same as the one of september? If you think it is the same and you site is hurt by the june Datarefresh a lot might be learned when you read the posts about it.
My few cents.
Update - about 8 hours after installing the "noodp" I suddenly popped back into the #1 position and top positions across the board.
I also removed two useless CJ affiliate links and did some minor cleanup on an already pretty clean site. Googlebot stops by my site almost twice daily and sucks up about 200-300Mb each time.
The ODP thing really makes sense now.
I think this worked for me because my DMOZ listing is so radically different from my current title. Hopefully this will settle out, but I expect more. You know, there's always a reason, and it's not that Google is picking one us individually. There so many factors that can come into play.
Hope this helps someone else in a similar boat.
None of my sites really changed on the 27th, they all hung in there on their ranking.
Although, I have been looking at a lot of sites that fell, I can note that all of the sites I have looked at I have noticed they either have a lot of duplication in the meta tags or no meta tags.
Google has been playing with the description and title meta tags right before the 27th hit.
One other note as well, the new sites we have been buiding have not hit any type of sandbox and get indexed a lot quicker now.
Google claims they can index faster with big daddy, that along with mozillabot, their index could be growing faster than we think.
Example, before big daddy they had 8 billion pages indexed, after big daddy they could have 10 billion pages indexed. Has anyone been monitoring that?
may I suggest rewrite for your files? Not sure how hard it is with your cart, but I think it would help a great deal with indexing.
Sorry if you considered it already :)
"Except a certain amount of pages /kw's never budge in the serps. They remain and hold their positions. They don't even nudge up and down a few places."
I am having that with 4 keywords. Thats the weird thing about it. The 4 keywords are the ones that show the url's before i did the 301. I can only see that in the site operator, but it shows the right urlwhen doing a simple google keyword search.
I partially agree with you about the old data. I see this too, but I also see new stuff. It makes me speculate that they when they do a data refresh they are testing NEW algo's with new and old data and see how it works with that given new algo. Kind of like panning for gold through a screen. However, a lot of times its fools gold from what I see in my area.
the 22 sept. filter was also a Data refresh
I agree 100% (Thats when this started if i remember correctly. Then Jagger came out,right?
Data refreshes are directed towards a particular algorithm
I agree, but i have to wonder wht they would do a data refresh, then GG says they do them on a regular basis, but it never changes the results for my keywords until the next one, I.e. 30 days or so out from the first noticeable one.
- There are also data refreshes to other algoritms.
I think this somewho may tie in to the above.
I think supp results may not be associated with data refreshes because I have only ever had 4 supp results, but then again those 4supp results also happen to be the 4 keywords than never move in rank. So maybe it is because of the data refresh. I wonder if microlinx is right because if old data and new data mixed in to a new data refresh, and if that old data for said given page was prior to nowis different (for example a 301), would that cause it to go supp?
With regards to meta tags, i dont think i am having that, but with dupe, its possible, but only if Google's algo has been tighting up their dupe content filter, if they actually have one. For the couple of hunder pages i am talking about there might be some dup comtent, but not a lot, and that dupe content is minimal incomparison to the content for any gioven page (kind of like a product description that is necessary for any of those pages. Maybe there is something i can look at there.
I am almost beginning to think that there is certainly room for me to improve with my pages, but at the same time, a data refresh with old and new data could cause a problem, at least for me. I am beginning to wonder more and more about 301's and old data incorporated in to the refreshes. Any one have suggestions, thoughts more info?
By the way, thanks for posting, this has been something i have been thinking about ever since Jagger.
Also, would anyone be willing to agree that the june 27th refresh may have more old data than new?
I am seeing newer data. My theory is that googles index is growing. Lets face it, the net is growing at an exponential rate
So if googles index grows 10% a year for example, that is a lot of pages. That alone will effect serps.
We did look into doing this, but it would be difficult. We are using the zencart template.