Preparing for the AI future


I want to give a shout-out to C&E News for a very nuanced and balanced article on the subject of AI, from which I’m going to crib the “quoted” bullets below.

“Here everything is pretty manual”

This article points out that “AI in its current state falls well short of emulating human intelligence” and seems to imply that most researchers are not using AI… I would contend that any researcher using Google to look up papers is using AI. And if they are eye-balling assay data to make go/no-go decision about each compound, they should consider adopting an AI that is trained on their past decisions. Of course some companies’ assays are so low-throughput that this would be ineffective. But I would encourage those companies to think about operating more efficiently: make assays higher-throughput to generate more ideas to help the patients they serve.

If you believe that “AI” or similar technologies are going to drive more pieces of the drug discovery process in the future, there are a few things you should do to prepare.

To support AI you will need “a platform approach to informatics”

Any Machine Learning or Advanced Analytics or “AI” infrastructure will need to consume well-organized, well-annotated data. Every compound you make, every cell you test, every assay you run… if you’re running it more than a few times, keep a record. It can’t make sense of your random Powerpoint presentations!

Store metadata

Imagine a table of assay data… in 5 years when an AI is consuming this dataset, it might say, “Guess what! I found a very strong relationship between column headers “assay 2” and “disease 2”!” A human needs to be able to look up what these names mean. This seems so obvious, and yet when files get dumped into an “archive” folder, this type of information is often lost. Just store a separate table or tab with two columns: “field” and “metadata” (call them whatever you want, but there should only be two). Each “field” should be the column headers of your other spreadsheet, like “assay 2”, “disease 2”, and the second column provides any note that a future human (or AI) would need to understand.

Use common language

Keep a shared spreadsheet to keep track of: names for reagents, names/numbers of compounds, names/numbers for specific cell lines, their batch/lot, etc. Inevitably your company will develop data silos, despite your best efforts to prevent this. (You ARE making an effort to prevent this, right!?!) If you can at least enforce consistent nomenclature, you do have a hope of later “joining” datasets, enabling you to ask questions that require comparing flow data to sequencing data, assay data, etc.

That’s it!

These steps are not difficult, but enforcing them can be, especially when researchers are used to working autonomously. The only thing I’ve found that helps with this is: talk about the company mission. You are all in this together, working toward a common goal. In that context, it makes sense for team A to spend a few extra minutes annotating their Excel files so they make sense to team B. Because not doing so will hurt both of them in the long run.