We Need PPE, but for Data
PPE = Personal Protective Equipment. Mandatory in the field, and sometimes, at our desks.
As Hurricane Harvey loomed and Team Rubicon started to mobilize, a truly miraculous thing happened. I’m not talking about the hundreds of volunteers that were dispatched, the thousands of training completions that were knocked out, nor am I talking about the amazing coordination and planning efforts our Greyshirts executed. No. I have data that shows our we are able to spontaneously spawn new members. How else do you explain all the Greyshirts born this year, apparently mature enough to help harass Harvey into submission?
All kidding aside, this serves as a useful story to raise an important point – our ability to give you the low down on all things TR relies on the quality of our data and the context in which we look at it. In this case, the explanation is likely something innocent and simple. When new volunteers were signing up to serve, they simply didn’t select their birth year and the system defaults to the current date. That’s why there’s a spread of birthdates for 2017. Some changed their birth month and day but not their birth year, while some didn’t change the date at all. This is an issue of data input and collection. It forces us to think about the sign-up process and the user experience. From a user’s point of view, we should modify our system so any birthdate under 18 is marked “invalid” (volunteers must be at least 18 years old to deploy with TR). It’s finds like this, both small and grand, that will help us to evolve and get better as an organization.
Data quality is a combination of different factors, like if the data is current, accurate, correct, complete, and relevant (unsurprisingly there’s a healthy debate out there what factors are needed to have quality data, but these are pretty standard examples). The better those characteristics are, the higher quality our data. The above example serves as a useful example of low quality data, since these dates of birth are neither accurate nor correct. But how do I know this data isn’t high quality data by just looking at the results?
Data by itself isn’t actually all that useful. Data is just a set of facts, and it requires context to be meaningful. In this case, the context I can place this data in is “human births are fairly more complicated than this.” I’m pretty confident in this conclusion.
Think about it. If I told any random person off the street, “Team Rubicon completed 232 work orders last week,” they would start backing away slowly. What is a work order? It’s actually what we refer to as a home serviced by our team. Even if I told another Greyshirt we completed that many work orders, they’d likely want to know other facts like what type of work orders, how much debris we removed, etc. They would also want to know how I got that data and how it was collected. Bringing in additional data is important to understand the whole. A painting is more impressive when you take a step back from looking at just the brushstrokes.
So data is important – no one is questioning that. But we should always question the data itself. In order to continually be better, we need to critically evaluate the quality of our data and question how we interpret it. We want our data to tell the truth of our story, and to do that, we need to make sure we are actually telling you the truth.
If you think we can do a better job or our data needs some TLC, let us know. Just remember Rule #1 please.
Check out Team Rubicon’s Open Initiative to view key
data points from our operations