I remember my first data extraction. As a clinician I enjoyed creating good clinical notes, and was adept at digesting fat files of written scrawl, laboratory records, and referral letters—for complex or very ill patients, they often came in falling apart volumes stacked inches high. How then was I meant to scan them for a clinical case notes review we were conducting? I simply did not know how.
I spent hours on my first few, pondering the story of that person, seeing gaps, tut-tutting at missed sections, trying to decipher the scroll of Doctor Somebody who will be forever nameless. It took my epidemiologist boss, exasperated with my slow pace, to show me how to scan a file thoroughly in 20 minutes—no matter the size. “See,” she said, “just take what you need and leave the rest.” I soon got good at it, and after a time did not try and fill in the gaps, try to get into the head of the clinician who wrote the notes, or worry about what had happened. I replaced my clinician hat with a data analyst cap.
I am fascinated by data: I love the patterns they make, I love turning them around in my head, laying them out like a deck of cards. A nicely laid out excel workbook with linked cells and graphs gives me the same weird pleasure as ironing. I adore David McCandless’s work on Information is Beautiful—I give his book more often than any other gift to colleagues. Now I am the one who can help a junior clinician look for the salient features in a group of patients. I can roam the heights and depths of a data pile, grit my teeth, and not get frustrated when a clinician complains about the data being useless.
When I close my eyes this is what the data look like: millions of streaming points of light converging from different people, leaving trails that you can track back to them. I also see the effort and pain each little point takes to make that journey, the hands, brains, computers, cloud based apps, and statistical manipulations it passes through.
Change is inevitable and it comes so fast now. Patient records are online, coding and diagnostic related groups are a business, and 90% of all data have been generated in the past two years. And here is the risk: we may get lost in the scale of it all. When we lose sight of the fact that data are actually about people, we will lose sight of the main purpose. Data without humans always in mind can be a harsh task master. Data are never right, they are just part of the truth. Data never lie, but they can be distorted. In the end, no human being can ever be completely abstracted into numbers or words.
At my core there will always be a deep understanding that every data point, every pixel on a data visualisation, every tickbox on an electronic patient record ultimately tracks back to a single person, one with a real name, a family, a story. Even if we reduce some elements of them to a code or an online outlier, even while we spend hours looking for assurance that our data are accurate, that fact will always remain. And as I track that data point back to its source I will marvel at the many hands, human and electronic, that it passed through, and ultimately at what we cannot see after deductive reason: the hint of a story of a person’s life, revealed in a number on a screen. For without that understanding and respect, human data are shallow and cheap.
Competing interests: I have no relevant interests to declare.