5 minutes for 5 hours of reading
JThe Big Mac index is an informal way to measure purchasing power parity. It obviously has many limitations, but it is a useful figure as it seeks to make exchange rate theory a bit more digestible. And it’s also a fine example of functionality – a number derived from other numbers aimed at capturing the essence of a complex reality.
Performance testing, red flags and data governance. Scary Topics in This Week’s Reading List.
- Fix performance regressions before they happen: A software performance blog, which I read as a real-life example of data-driven solutions going beyond expert-designed rules. Netflix software engineers run memory and responsiveness tests for every pull request. Initially, static thresholds were used. Setting thresholds manually is not only laborious, but has other limitations such as ignoring context. And then they turned to anomaly and change point detection. While some may struggle to consider these approaches advanced analytics, they have provided incredible value. Because the process has been automated, the number of tests has increased significantly. Importantly, 45% of alerts were true regressions (compared to
- Red flags to watch out for when joining a data team: Although it may seem that companies are the decision makers in the recruitment process, candidates are also in the driver’s seat. And while it’s tough when we’re desperate to land a data science job, we need to stay alert to any red flags. I love the questions Eugene shares, even though a lot of them aren’t red flags for me. But it is okay. It just goes to show that data science is broad and each data scientist has different preferences and expectations of the job. Some of us want to do ML when everything else is ready, some of us want to put the foundation in place. Some want to have a clear roadmap in place, others want to create it. Some don’t want to work under an incompetent manager, some see it as an opportunity. But whoever you are, be careful your red flags. (Eugene Yan)
- Some thoughts on data governance: Data governance is indeed an important topic. Who wouldn’t want high data quality and observability, right? But I agree with Mohammad that the current approach is odd at best. My experience is that data governance is often treated as if the subjects are computers (acting on the instructions of a code), not humans. And then we have massive data governance projects aimed at cleaning up the mess created by complex IT systems not designed with data describing activities in mind (and even worse, often by the same companies that designed the systems of so complicated way). I believe that when data is treated as a mirror of (business) reality, it provides guidance both for creating data and how to use data for business purposes. If the data is a by-product of the operational software, no amount of data mapping and documentation exercise, or policy will save you. I am convinced that data governance is more about people than technology. And people should be coerced, not ruled. (Mohammed Syed)
I’m driving to Prague later today. We finally got our youngest son’s passport, so it’s time to present him to the family. It only took 5 months, two official translations, countless official forms and 1,200 km by car to go to the embassy four times. Czech e-government in practice. Yay!