Tuesday, August 2, 2011

Zombie Statistician

I met with a physician at work today. He just purchased a copy of the Nationwide Inpatient Sample (NIS) and wanted to do some research out of the data. His question was whether having a certain condition exposed a person to premature death. He wanted to know if I would be able to extract and analyze the data.

He purchased 12 years of data; each year contains a sample of just under 8 million subjects. As a starting point, he suggested we start with the 2009 data and see what we learn from just that sub-sample. The only concern he had was that we wouldn't have enough follow-up data to know if the patients had died.

To me, it was a simple solution: extract the data, pull the social security numbers, and run them through the Social Security Death Index. As I said this, I realized that it wasn't likely that this purchased data set would contain social security numbers.

"Well, the website said it had social security numbers."

I furrowed my brow and looked at the set of 30 or so CD's sitting in a binder on the table in front of me.

"You mean to tell me these discs have social security for some 90 million subjects?"

"I think so."

I try to suppress the panic attack and say, "I think I'm going to have you take these discs back to your office and I'll check with my supervisors to make sure I'm legally allowed to work with these data."

Turns out I am. Another panic attack. I'm allowed to work with these data pending my completion of a number of training modules on the internet. These will undoubtedly remind me many times over of the stiff civil and criminal penalties to which I will be subjected should I even accidentally leak any identifiable information. Forget the panic attack--my heart just stopped.

So there it is. I died at work today, but will continue sifting through this mountain of data in my undead state. And the worst part of it all--it wouldn't even make a decent straight-to-dvd movie.

No comments:

Post a Comment