How do we leverage big data responsibly?

Hazel Tang
5 min readMay 12, 2020

The first AIMed webinar of the year “Data rights in the age of machine learning” sponsored by Cerner, took place on 30 January 2020.

Dr. Tanuj K. Gupta, Vice-President at Cerner Intelligence acted as the session’s moderator. While Dr. Mark Hoffman, Chief Research Information Officer, Children’s Mercy Hospital, Sarah Gerke, Research Fellow, Medicine, Artificial Intelligence (AI) and Law, Petrie-Flom Center for Health Law Policy, Biotechnology and Bioethics at Harvard Law School, and Dr. Anthony Chang, AIMed Founder, Chief Intelligence and Innovation Officer of Children’s Hospital at Orange County took turn to express their opinions on how to create a responsible data environment that not only facilitate present technological developments but also ensure healthcare data and patient privacy are duly protected.

Here are some of the takeaways.

Ethics by design

Gerke believes it’s necessary to make sure both datasets and algorithms are reliable and valid. When it comes to data quality, there is always the “garbage in garbage out” concern; AI will not exercise its full potential, shall the data used to train it are sub-optimal. Besides, Gerke thought AI manufacturers should be aware of the types of biases existed in data.

If an algorithm is trained using data from the US only, there could be an additional risk of contextual bias if it’s deployed elsewhere. Because AI developers bring in their own values, beliefs and set their own perimeters, some of these biases could be unconscious. As such, Gerke supports “ethics by design”; as soon as the development process begins, one should start thinking about the ethical consequences that come with the algorithm.

Dr. Hoffman agreed, he said this is a significant area of effort for him and his research team at Children’s Mercy Hospital. Sometimes, the heterogeneity of Electronic Health Record (EHR) systems could have a profound effect on the data itself, especially if researchers or developers do not understand the front to back workflow of how these data are captured and got into these sources.

“I am always a little bit skeptical of projects where people just want the data handed to them and they will figure the rest out. That always worries me,” Dr. Hoffman said. This is because when aggregating large-scale EHR information, minute variations such as missing data, could lead to critical misinterpretations. The byproducts could have a direct impact on the effectiveness of an algorithm; if nuisances are not underlined or understood.

An IRB approval for everything we do?

Gerke doubted the necessity for an Institutional Review Board (IRB) (i.e., an independent ethics committee to review and protect the rights of human subjects involved in an experimental study) approval for everything that’s related to data as there’s formal regulation like Health Insurance Portability and Accountability Act of 1996 (HIPAA) in place.

She said the problem lies in the fact that some of the big tech companies collecting healthcare-related information fall outside of the HIPAA regime at the moment. HIPAA only covers individuals’ identified health information generated by HIPAA covered entities or their business associates.

Some are trying to fill the gap by improving data privacy of individuals at the state level. For example, the California Consumer Privacy Act (CCPA) since 1stJanuary. Nevertheless, it becomes complicated for companies to adhere to all these different regulatory practices. Hence, Gerke urged for a federal law that super-seats all the state laws so companies will only have to comply to a set of centralized data privacy rules.

On the other hand, Dr. Hoffman thought an IRB approval is needed depending on whether the data is purely used for research or operation purpose. In general, tech companies should be sanitized to how to handle questions concerning data privacy and they shouldn’t regard IRB or regulations as something which will delay their work rather it strengthens their work and credibility.

FDA regulations

Dr. Chang remarked the US Food and Drug Administration (FDA) deserves a lot of credit for trying to lay out the trajectory of AI regulations. However, he is not comfortable with the term “software as a medical device” (SAMD) because software development is agile and dynamic, yet for medical device, people tend to have a timeline and carry certain expectations at different stages.

Gerke added, the concern is the FDA evaluates only locked algorithms and if one needs to improve or change it, they have to go through another round of review process which can be costly and time-consuming. The FDA has come up with a discussion paper to explore ways that can release the potential of AI and at the same time, enable manufacturers to update their algorithms even after the approval.

In spite so, Gerke said the focus should not be too extensively plan because often, manufacturers do not know the kind of update they need to do. So, it will be better to focus explicitly on post-monitoring monitoring. Besides, Dr. Hoffman believes probably there is a need for more clarity around how consumer-facing health applications are governed, so they will not aggravate or stratify any existing risks.

Looking at the entire picture

Dr. Hoffman also said in addition to the FDA being challenged to validate decision support tools driven by AI, it’s also important to think about the organizations that are implementing these tools. For example, most hospitals have an independent committee to decide and authorize newer decision support models before they are released into the practice.

These committees are also accustomed to dealing with standardized behaviors of different systems. Therefore, there is going to be a major need to educate these individuals when we move away from the predictable input and output to a much-more fluid and variable logic that comes out of AI.

At the end of the day, as suggested by Gerke, it’s important to see the entire picture, not only the AI product but also the context where it’s going to be applied as well as healthcare professionals who are deploying it. It does not mean that once AI is employed, its value will become apparent in the process or within that particular setting.

You may revisit the webinar here.

*This article was originally published on AIMed Blog on 3 February 2020.

--

--

Hazel Tang

Writer @RiceMedia. Beating up info till they scream stories. Words with MetroUK, gal-dem, Potluck Zine, Towards Data Science, among others. Data Enthusiast