Thought Leadership Peptides Manufacturing & Production

Debating the Applications of AI/ML in Early-Stage Drug Discovery

On-Demand
April 9, 2025
|
08:00 UK Time
|
Event lasts 1h
Peter Henstock

Peter Henstock

Former Machine Learning & Artificial Intelligence Lead

Pfizer

Format: 20 minute presentation followed by 40 minute panel discussion

1:15 

Hi, good morning, good afternoon, wherever you are. 

 
1:18 
Very happy to be here and looking forward to this discussion. 

 
1:21 
I thought the first thing we could do is just turn our cameras on and introduce yourselves just so we can know who's here and know what kind of topics we would like to discuss. 

 
1:30 
So perhaps who you are, what company you're with, what your main function is and anything you want to get out of this session, I can either ask people to gorgeous to have you go ahead. 

 
1:46 
Good, I'm Fausto, Global Head of Innovation Data Science at GSK. 

 
1:51 
I already met Peter before. 

 
1:53 
Peter, nice that you are leading the session. 

 
1:55 
And I'm here today to understand better and share with you some views about what do with the IMF for drug discovery and pharma challenges and potential solutions. 

 
2:05 
So I'm based in London and nice to meet you all. 

 
2:10 
Thanks, Fausto. 

 
2:13 
Next, I'm Richard Davenport. 

 
2:15 
I'm Director of drug discovery at Helix, an AI Company in Cambridge. 

 
2:21 
I come from a pharma background, not a machine learning or AI background and I'm here for general interest very good. 

 
2:33 
My name is Dimitar Dimitrov. 

 
2:35 
I'm the CEO & Co-founder of Micar 21. 

 
2:37 
This is drug discovery company from scratch, still preclinical. 

 
2:41 
This prediction of preclinical, this is a Bulgarian biotech company. 

 
2:46 
We have molecules in our pipeline and now we develop and use AI platform for digital pharma, digital twin pharma with one of the top 20 pharma company in the world. 

 
3:03 
Thank you. 

 
3:04 
Great. 

 
3:05 
Thanks. 

 
3:11 
Hi, I'm Sammy Sambu calling in from Brussels, Associate Director for AI Solutions at UCB and my focus has been on early stage research primarily focusing on really developing a workbench for scientists. 

 
3:30 
Hi, my name is Sung-Hun and I'm working at Cambridge Research site at ASI in Boston, US and my background is structural biology, but I'm interested in learning about the recent trends in AI/ML. 

 
3:53 
Hi everyone, my name is Jason, I'm a computational chemist at Amgen. 

 
3:57 
We're based in Thousand Oaks, CA. 

 
4:00 
And yeah, I'm also just here for general interest and to hear what other practitioners have to say about the field. 

 
4:12 
All right, then I think that's everyone. 

 
4:20 
But anyway, are we here for chemistry or what is our scope of discovery? 

 
4:26 
So I've heard AI in the titles. 

 
4:28 
I heard some in silico computational chemistry. 

 
4:32 
What are you expecting to hear today? 

 
4:33 
Let's figure out the range of topics that we'd like to discuss. 

 
4:37 
Is it drug discovery in general for biologics? 

 
4:39 
Is it more chemistry focused? 

 
4:42 
What are you what would you like to hear the most about? 

 
4:50 
Sounds like we don't have a fixed agenda today. 

 
4:53 
But I'm more interested in chemistry side of the AI/ML applications. 

 
5:00 
Chemistry side very good. 

 
5:02 
Is that what everyone else is here for or something else? 

 
5:07 
General problems with AI applicability to early stages of discovery, no matter if it's chem, bio, gen, whatever. 

 
5:16 
All right, sounds great. 

 
5:18 
I was thinking it was more of the chem side, but I just wanted to make sure that everyone else was in the same boat. 

 
5:23 
So I'm based in Boston, I am not focused in chemistry. 

 
5:26 
I used to focus more in early discovery, but I've now shifted over to the dark side of clinical a little bit, working with statistics groups and other things and still trying to find my way over there but have a remit across the company still in joint discovery in other areas. 

 
5:41 
My main focus is to get AI everywhere across pharma, but that's pushing a rock uphill as you all know So let's talk about chemistry. 

 
5:53 
And when people say that chemistry is a or people say AI is it a successful thing? 

 
6:00 
Is it not a successful thing? 

 
6:02 
People say the ROI is terrible. 

 
6:04 
They say, well, show me an example of where it works and I people usually point at imaging and say, well, look, we have red eye reduction in our cameras. 

 
6:12 
We can find our friends in Facebook and things like that. 

 
6:17 
I'm wondering is chemistry actually the best use case that we have in pharma? 

 
6:21 
I mean, it's been around for models for 25 years at least. 

 
6:27 
It's been advancing continuously. 

 
6:29 
We have trends that keep up with the machine learning aspects. 

 
6:33 
We have new approaches with deep learning that came around even 10 years ago with that Merck competition that proved neural networks and deep learning could actually work in the space. 

 
6:44 
We have as descriptors that go have gone from terrible things and physical bonds and atoms to really advanced things right now with graph and 3D technology and auto encoder type representations. 

 
6:59 
What do you think? 

 
7:00 
Am I right to resist the wrong place? 

 
7:06 
I think it is a good start, Peter, right? 

 
7:09 
If many of them have an interest in chem, we can speak about chem. 

 
7:12 
Like you, I'm noticing that there is a lot of data available for chem compared to other things, right? 

 
7:18 
So just trying to use AI/ML to screen more of the things that work before and are similar to some new drugs that you need to develop is allowing us to increase the probability or at least accelerate the compounds that are more promising compared to what was possible before is more of the same. 

 
7:39 
So if even later down the pipeline you failed with the same rate, you can at least have the ability in the early stages to have better leads, right. 

 
7:51 
So I agree with you from that point of view. 

 
7:54 
I did not see much working on graphs, even if there are experiments or in other kind of domains. 

 
8:00 
It seems that chem is an easy peak at least for data viability, even if in early stages. 

 
8:06 
Many times they annotate things manually and therefore the kind of algorithms that you need to use are not really based on real explanations, but about finding correlations right? 

 
8:18 
Like it is happening in many other fields. 

 
8:20 
So yeah, I agree with you. 

 
8:22 
But hey, maybe somebody else has some other views like Richard or Dimitar. 

 
8:27 
Maybe they're using Bayesian learning right to discover new compounds. 

 
8:30 
So it will be interesting to know from them as start-ups what they are doing in the space. 

 
8:38 
The one part we use standard science, when we make our pipeline, this novel molecules, 18 novel molecules in our pipeline, but another AI, we use exactly our platform we use three main communities of AI/ML deep ML and semantic. 

 
9:02 
Unlike other solutions in the market, in the centre of the solution sits of computation knowledge graph with our organisation data strategies, white paper and other data, but also data that is computed to fulfil specific need of the SME. 

 
9:20 
Also the data is deployed. 

 
9:21 
There is a visual tool that enable users to explore data and establish road map for the solution. 

 
9:29 
Every solution in use case driving therefore the platform bridge the gap between business needs and data to deliver value. 

 
9:38 
This value of the data is delivered by the platform by employer. 

 
9:42 
Digital twin approach. 

 
9:44 
Why do we use it for the digital transformation across industries? 

 
9:48 
Actually we get all the data from pharma, they give us all the data for the development directs and directs already in the market. 

 
10:00 
And they give another part 10 different hospitals and patient data. 

 
10:06 
And actually we get all the data and we visualise these solutions and we find every use cases for new application for any drugs or any drug-drug interaction and many others. 

 
10:25 
Few questions when you're using graphs are using them to find the new part of expressions for genes or are you using them even to organise clinical trials? 

 
10:34 
Because you were talking about hospitals and nurses and so on, right? 

 
10:38 
And the other thing to that may be could be of interest even if it's not pure chem. 

 
10:43 
Peter is saying that he was moving to clinical. 

 
10:45 
I try to do that too. 

 
10:46 
There is a big challenge in organising clinical trials. 

 
10:49 
Are you doing something there in that space with AI/ML and therefore not only early stages discovery or I actually bought of this, but I'm not scientist and I'm not AI expert in this case, I'm CEO in the company and I have AI team and have different science team in the drug discovery platform. 

 
11:12 
And maybe it will be better to make additional meeting to talk more deeper about this. 

 
11:23 
Thank you. 

 
11:24 
Daniel, I don't think you introduced yourself. 

 
11:27 
Welcome. 

 
11:28 
Would you like to say where you are, what you do and what you're hoping to get out of this? 

 
11:38 
All right, maybe not, maybe later. 

 
11:41 
And I think Fausto. 

 
11:42 
You called out Richard, I think, and it was curious as to what Richard was going to do. 

 
11:48 
So I guess so Healx doesn't do in chemistry AI yet other than predictive models, which are pretty much off the shelf when we do use them. 

 
12:00 
Our AI is focused at the biology end around knowledge graphs for rare diseases. 

 
12:07 
So some of that is genetic information, some of its evidence change that sort of thing. 

 
12:17 
So that's where our sweet spot is historically. 

 
12:22 
I'm not the best person to comment on that. 

 
12:24 
I'm more interested in where we move over the next few years in the chemistry space as we build those capabilities. 

 
12:33 
OK, fair enough. 

 
12:35 
So maybe we should figure out what exactly discovery, especially the chemistry site is today. 

 
12:41 
What are the main aspects of chemistry that you're excited about in your roles just at the present day? 

 
12:49 
We'll talk about the future at the end. 

 
12:54 
I think it traditionally we divide the computational drug discovery into two brackets. 

 
13:00 
One is the ligand-based approach and the other one is the structure based approach. 

 
13:06 
And a lot of maybe past few years that a lot of prediction models based on public data or company data predict activities and other things for the ligand-based approach. 

 
13:24 
And then I'm excited too about the potential of that AI and the structure-based approach even without the alpha fold 2 we have a lot of structure information, but the AI is less as competitive or developed for that aspect. 

 
13:44 
For example, if I have some fragment hits or some starting seed, then what are you going to do with those structures, complex structures and develop better analogues based on the AI/ML technologies. 

 
14:03 
So I think the de Novo design, not the generative models that the complete molecules generate from the inference, but not that, but the de novo design, the part of the molecule is already fixed. 

 
14:19 
And then starting from there how to utilise the AI/ML technology to develop the binding pocket or get around the side effects and off target effects those things more rapidly, efficiently. 

 
14:35 
So I wanted to hear about your opinions or experience on that structure based AI/ML approach. 

 
14:47 
Great, thank you. 

 
14:49 
Who would like to talk about that Jason, I bet you're an expert in that area. 

 
14:59 
Oh, I wouldn't say that I've applied any machine learning to the structure-based aspect. 

 
15:03 
I think as Sung-Hun was saying, most of my experience with applying these models and developing them have been in the ligand based side, mostly following the literature and sort of what the current state-of-the-art is in that area, or at least trying it out. 

 
15:21 
So, you know, we were on the graph based convolutional neural network type of thing for a while and then now we're kind of focused on the generative design aspect. 

 
15:35 
But yeah, I mean, I'm also really curious about the structure based approach as well because as it's been alluded to when you have a lot of structure based data, you want to leverage it the best you can. 

 
15:47 
So if anyone else can contribute to that, I would also be grateful. 

 
15:51 
I'd love to learn more about it. 

 
15:58 
Anybody. 

 
16:05 
Well, my experience to that is different. 

 
16:07 
So I'm not able to answer that question because I don't have direct experience. 

 
16:11 
What we were doing in time was kind of different. 

 
16:15 
So unfortunately you need to call me out on that. 

 
16:23 
Dimitar, does your platform dive into that area at all. 

 
16:28 
Sorry. 

 
16:29 
Does your platform focus on the ligand based or the other structural? 

16:42 

Ligand based in one of the platform and another AI platform is more complexity. Actually we provide direct discovery, the risk model. 

 
16:50 
If some pharma company have validated targets, we get this target and we find novel molecule. 

 
16:59 
When science data is good then we receive the money and we have already two different contracts which is different with pharma. 

 
17:09 
Exactly in the business model we are very complex. 

 
17:18 
Sammy, what about UCB what do they use? 

 
17:22 
Yeah. 

 
17:23 
And so really I think the UCB approach is just take to take a step back. 

 
17:28 
You know, part of the challenge has been to say what is the right target, right. 

 
17:32 
And when we think about it, you know, there's often kind of like a community focus on starting targets. 

 
17:42 
If you give, if you take a therapeutic area, they'll often be targets. 

 
17:45 
The community will say, you know, those are really great targets. 

 
17:48 
But then the question has been, you know, could we come up with novel targets? 

 
17:52 
And so it's taking that step back. 

 
17:54 
We've had to actually say, well, what is the biomedical corpus that we can assign to the therapeutic area of interest, collect that data and then find a way to, you know, cleverly represent that existing data and perhaps try to predict new relationships that might not be at the top of the pile. 

 
18:16 
And so in this regard, for us, we've had to use natural language processing in order to really automate this entire process. 

 
18:24 
And then from an ML point of view, we've had to look at development of knowledge graphs and using link prediction and techniques of that nature to try and draw out the hidden relationships within this biomedical corpus. 

 
18:40 
It's still very much an early stage work, but I think so far it looks like really this is something that the community is asking for. 

 
18:48 
So for us, we, you know, specifically where I'm seated, you know, we're looking at our medicinal chemist as our client and for us what we do is to try and leverage advances in AI/ML to get this newer novel target. 

 
19:05 
All right. 

 
19:05 
So I see two distinct threads going emerging. 

 
19:07 
What is the knowledge graph area in target selection? 

 
19:10 
The other one seems to be we don't have much in the structural AI for chemistry, but we do have lots of ligand based. 

 
19:18 
Where does docking in those kinds of approaches fit? 

 
19:21 
Is that the, is that more of a structural approach or is that ligand based? 

 
19:27 
Can I just go back a bit on the structural side because I'm struggling to understand what the question is. 

 
19:33 
Is it that people are after AI giving you structural information when obtaining a structure isn't possible? 

 
19:45 
Or is it that they're after when you have a structure using AI to elaborate your chemistry in a more efficient way and then a medicinal chemist would, I guess I'm interested to know what, which of those is where people would like to see AI and structure added together. 

 
20:08 
I think as far as I know from recent literatures about that direction, that I think the AI now is used as a very fast docking alternative. 

 
20:23 
So instead of using docking with a given structure that people are trying to use the AI/ML approach to very quickly sample the broader chemical space. 

 
20:39 
So that's I think it's the current using the structured information to do it right. 

 
20:43 
Yeah. 

 
20:44 
And but I think that there might be more than that we can do with the AI/ML, Yeah, because I possibly would have thought there's more value the other way from a game changing point of view in the sense that yes, getting a structure for a target is now much more doable than it was 10 years ago. 

 
21:06 
But it's still a considerable effort in cost and time to do it. 

 
21:11 
Whereas I think I'm seeing that predictability of structures are becoming much better. 

 
21:20 
And actually there's cases now in the literature where people have predicted then solved, and they're effectively the same. 

 
21:26 
So you'd have a jump of nine to 12 months on your drug discovery programme, which for me is a game changer. 

 
21:34 
Perfectly honest. 

 
21:35 
Maybe that would please me more than sampling via AI of docking. 

 
21:44 
Yeah, I think it might do it. 

 
21:45 
I'm thinking exactly the same thing. 

 
21:47 
If I, for example, have one thousand or two thousnad compounds and activities, but we don't have the target structure binding pocket information and how can we derive from the confirmations of the ligands and the orthogonal properties? 

 
22:07 
How to because we know the maybe the binding pocket binding protein target, we know the target protein, but we don't have the binding pocket structure. 

 
22:16 
So how could we drive those binding pocket information from ligand structures and expertise? 

 
22:23 
And maybe the AI/ML technology can give us some advantage. 

 
22:29 
Maybe it's not available now, but wouldn't you use alpha fold or something for that? 

 
22:37 
So if it's a target that could be a structural target, so it fits into a family of targets that have been structurally solved by many other people in the PDB or something, then you could use alpha fold quite nicely for that, I would have thought. 

 
22:55 
The other example if you're using data and ligands, is not structural, is it? 

 
23:01 
It's ligand based, am I right? 

 
23:03 
Sorry, not am I right? 

 
23:04 
Am I? 

 
23:05 
That's my thinking, yeah. 

 
23:07 
Unfortunately, I think that a lot of the target of interest are transmembrane proteins and they're not well represented in the PDB and it's not very reliable with alphafold 2. 

 
23:20 
So is alphafold 2 accurate enough for computational chemistry? 

 
23:28 
I thought it was still lacking. 

 
23:32 
I don't think you know until you try, but I think there's a number of cases where people have been working for six months or so while they've solved the structure and they've had a good enough model, they've obviously managed to confirm it and published it because why wouldn't you? 

 
23:48 
But whether they had confidence to believe their model for six months or so, I don't know if that's a different question, isn't it? 

 
24:04 
What about the deep reinforcement learning? 

 
24:06 
Is anyone involved in that? 

 
24:08 
I just, you mentioned a few different technologies, but that was not one of them. 

 
24:15 
No real experience with that either. 

 
24:19 
I have to admit, I'm kind of excited about that overall area. 

 
24:22 
The option of ability to learn and figure out what data you need to make next so that you can improve your models has been an interest of mine for about 20 years. 

 
24:32 
But now all of a sudden it's taken off with the more formal and deep learning and reinforcement learning aspects, which weren't available until a few years ago. 

 
24:43 
But I think that's something to watch out for, I guess. 

 
24:49 
Eduardo, welcome. 

 
24:52 
Maybe I have a little bit more technical question. 

 
24:55 
So what's your opinion about the minimal number of data points that enable the AI/ML approach in your system? 

 
25:08 
I know that you can transfer the learning from other systems to our system, but generally, how many data points make you comfortable to initiate the AI or ML approach, like a reinforcement learning or active learning or other approaches? 

 
25:34 
I'm sure there's opinions in this group. 

 
25:49 
You know, maybe I'll take a, you know, a slightly different view. 

 
25:53 
You know, part of my observations of late have been more concerned with the particular target you're trying to predict. 

 
26:01 
How good is the measurement science behind it? 

 
26:06 
So for instance, in some cases we've been looking at categorical variables staying in the space of safety and toxicology. 

 
26:17 
And in a number of measurements, what I really see is that if there has been a careful development of the measurement stands so that you can truly distinguish the classes that are sensible to the toxicology, then, you know, to be honest, I can have a very small sample set. 

 
26:35 
On the other hand, if the measurement sense is really poor so that the categories tend to have an extensive overlap, then no matter how much data is provided to the model, you always end up having, you know, very little applicability and so much of a grey area. 

 
26:56 
So that in, you know, all in all, I have to go back to the data provider and just say, look, you have to sharpen the data strands, you have to get a better predictor. 

 
27:06 
Otherwise we are never going to have a model that's better than what we have right now. 

 
27:20 
Jason, how many compounds you typically need for your models in order to use the AI? 

 
27:25 
I mean it, I guess from a naive view, depends on the model that you want to use and just what you're doing. 

 
27:34 
But I think for I guess a given drug discovery campaign, we try to model something relatively safe, like maybe we have some assay data or things like that. 

 
27:48 
And these things have been kind of, we have a lot of information about what we're trying to do and what our range of, I guess, values for some of the things that we measure. 

 
28:02 
And then if we have a lot of follow up data from the chemists as well. 

 
28:08 
I think that, I mean, it's hard to give a number, but I would attempt some more simple models with something like 1000 data points. 

 
28:17 
And I wouldn't touch any of the more deep learning stuff with that amount. 

 
28:23 
But we can find pretty, you know, we can build a pretty decent starting point with that amount of data. 

 
28:28 
We're using a more basic model that's been around for, you know, many years. 

 
28:35 
But then as we kind of get more and if the target changes, like for example, it's not part, if the goal isn't like once one simple or one single target or for a specific project, but instead is a more general type of thing, then we'd have more data to work with. 

 
28:51 
And then perhaps then we'd consider a more deep learning neural network approach. 

 
28:57 
But that yeah, I don't think that would be within the within reason for something less than 1000 data points. 

 
29:03 
For instance, I know I've seen some companies advertise things that they start off with at least 4 compounds and they go from there and mostly a structural based approach. 

 
29:16 
It's wishful, I think, but they seem to be succeeding in some cases. 

 
29:20 
So it's a little bit surprising, but I guess so Sung-Hun Bae your questions really about activity, I'm assuming and how many you need for activity. 

 
29:32 
Do most of your approaches focus on just activity or are you combining a multi variant approach to make it all so soluble, safe and everything else at the same time? 

 
29:41 
And how do you incorporate the other factors? 

 
29:46 
I would say that most of my previous experience with the AI/ML is about the off target and those things though. 

 
29:56 
So it's not because we don't have enough data points I guess. 

 
30:00 
So because the off targets we have public data, we can get the public data and also other sources. 

 
30:09 
So that makes the available data points much larger. 

 
30:14 
So then we can apply those things. 

 
30:17 
So I think I would say even for the ligand-based approach, the activity based model is not very readily applicable in active projects. 

 
30:32 
So we can maybe try the concept, testing out the concept with the closed project or a target project retrospectively, but not in the active projects. 

 
30:45 
Yeah, that's the caveat. 

 
30:51 
Yeah. 

 
30:53 
I will give you one example from my company. 

 
30:56 
Actually, I invest a lot of money in this pipeline and this is my own pocket money and to invest in eight novel molecules. 

 
31:04 
And we used a lot of prediction before we made any experimental data in vivo, in vitro data. 

 
31:14 
And this is the reason that we, for example, now we have novel molecule for cancer metastasis, but one mouse studies around €20,000. 

 
31:30 
And when I must invest before this, I do with my team a lot of prediction for the preclinical phase for the PK TPD and many other things. 

 
31:42 
And then we invest and already is 100% always before we made all of this prediction. 

 
31:54 
And when we invest money already studies are very good. 

 
31:59 
So with good science data and many more. Actually for this cancer metastasis, we completely prevent cancer metastasis in the mouth. 

 
32:09 
And now we do additional studies in this way. 

 
32:14 
This one of the molecules connected for the immune system. 

 
32:18 
Just to say that we invest a lot of money for the prediction because this is connected with the investment actually. 

 
32:30 
I'll ask a follow up for you. 

 
32:34 
Are you investing in data sources or using public data sources to drive these different models that you're having success with? 

 
32:42 
Actually we use a lot of different tools and one of our chief scientists has always made traditional optimisation, for example FAP plus metrics, we make one scientific paper with Schrodinger. 

 
33:06 
Then our chief scientist optimised his model 50%. 

 
33:11 
And this is the same situation in the different software and different groups and teams in this area. 

 
33:20 
And when we make any of these predictions, we don't use only one software. 

 
33:29 
And all of this because he have already very good experience in this way, but will be better to talk with him because I'm not a scientist. 

 
33:42 
Sure. 

 
33:42 
Anyone else like to try me about the data? 

 
33:45 
I think Sammy, you also mentioned data sources, if I'm not mistaken. 

 
33:49 
Yes. 

 
33:50 
And you know, this is actually a really hot area. 

 
33:54 
Often times, you know, the scientists may come in with a problem but may not have all the data or may have, you know, a limited set of data. 

 
34:04 
And you know, speaking organizationally, really like from the point of view of an enterprise, you know, there might be a lot of investment in artificial intelligence and machine learning, but there ought to be also a very strategic view on how can we gather existing data and then how can we generate newer data that as is needed on the generation of the data. 

 
34:25 
It's really to plug in the gaps where there are some on gathering existing data. 

 
34:30 
We have to think about it externally and internally. 

 
34:33 
So in fact, a lot of work is now going into really carrying out this kind of, you know, broadline enterprise wide surveys, you know, looking at what data has been gathered since the days before the electronic lab notebook, you know, and really making sure that is digitised and made available across the enterprise. 

 
34:52 
And then on the other hand, we're looking at external data. 

 
34:55 
You know, there are lots of collaborations the company has and not every department is aware of those collaborations and yet they're all working at some stage in the pipeline. 

 
35:05 
You know, so it's really important that the early-stage researcher knows what the, you know, the, you know, the chemist or the chemical engineer might be looking at in terms of solubility, in terms of processability, because some of those parameters are going to be really useful and informative for them. 

 
35:19 
So this is, you know, it's a huge amount of effort and a huge amount of investment also have to go into that. 

 
35:25 
And even thinking about how you staff for this kind of an effort, I think it's really it's just super vital for us because without the data, really cannot build any models. 

 
35:36 
Yeah, thank you for that. 

 
35:38 
Daniel or Eduardo, anything to add? 

 
35:47 
Not on my side. 

 
35:49 
I think one of the interesting aspects of data at least that kind of interests me is really about how it can be used in terms of understanding the basically the robustness models or what you get as an outcome of these predictions. 

 
36:06 
Like how much you can trust the model, how much you can understand kind of how sensitive it is to the different error speed in the measurements that you have from the physical setup to those. 

 
36:21 
So I think this is kind of generally the aspect that is interesting. 

 
36:27 
Thanks. 

 
36:28 
Thanks. 

 
36:28 
While we got you here, where are you from? 

 
36:30 
We missed the interest from you. 

 
36:32 
Oh, great. 

 
36:34 
Yeah. 

 
36:35 
I'm generally working with Dimitar and we're kind of looking at different aspects of the work and alongside this kind of view of really robustness and understanding confidence in those AI models in general. 

 
36:52 
Great. 

 
36:54 
Thank you. 

 
36:54 
Thank you. 

 
36:57 
He is part of our team, but another part for the clinical trials. 

 
37:06 
There is another AI team connected with the digital twin for the pharma company that I explained with connection for the drugs and for the hospitals patient date and sound. 

 
37:18 
All right, very good. 

 
37:20 
Thank you. 

 
37:21 
Because yeah, in terms of the so Daniel, you mentioned the uncertainty of the model and we also mentioned different data sources. 

 
37:32 
I was wondering if anyone's using the kind of multitask learning that we see with the where you instead of making one model for one purpose, you make money models and train them all at the same time with the same data. 

 
37:44 
So it's descriptors plus multiple outcomes that all leverage each other to make better predictions. 

 
37:50 
I know Novartis had a pQCR model that they came up with a few years ago and published. 

 
37:55 
Is anyone else involved in that kind of work? 

 
38:05 
Guess not. 

 
38:07 
Well, actually we so in terms of really looking at the, you know, the opportunity there, I think so we've been actually working very hard at trying to make sure that we develop foundational models. 

 
38:22 
And it really builds on that multitask principle to say that if you can have a model that will at the core of it, have an internal representation of the real world that can be applied on tox safety can be applied on, you know, binding activity can be applied on, you know, solubility, then you have a better, a much higher level of confidence behind those models. 

 
38:51 
And this is where, you know, I think it's been mentioned before, you know, finding better ways of embedding molecules really seems to play a critical role in getting to a good foundational model. 

 
39:04 
So, you know, when you look at it, you know, smiles as they were, you know, they do encode certain information, but they also don't encode certain other information. 

 
39:13 
So there's often a need to really say, can we re-engineer our input and make sure that, you know, we adequately represent the chemical information we know is going to be critical in distinguishing one class of molecules from another, distinguishing, you know, high levels of activity from low levels of activity, even in a graded fashion, as would be in a regression. 

 
39:33 
We need that kind of information. 

 
39:34 
Then there's been another aspect which has been to say, you know, can we look at a different way of, you know, kind of like a non-Euclidian representation of the spatial information encoded by that molecule. 

 
39:46 
And this is where the graphs come in and they do a very good job. 

 
39:48 
Now, when you think about, you know, graphs of the concept that, you know, we're using it on multiple levels. 

 
39:54 
They're using graphs to encode information in our biomedical knowledge, you know, the corpus that I described earlier. 

 
40:00 
And we're also using that same mathematical conceptualization of the world to encode the topology of molecule in a non-Euclidean space. 

 
40:07 
And then we blend these different sources of information into one better representation of the molecule in silico. 

 
40:16 
And once you're in that kind of a space, you can really say that, yeah, you know, for the same kind of task. 

 
40:22 
And at the core of it, you have to think of. 

 
40:24 
You know what's coming in from the smile side, you know, it's really a string representation of the molecule. 

 
40:28 
So you're using natural language processing again, another two-line technology. 

 
40:32 
You have been, you know, using in your architecture for your NLP learning for your generation of your knowledge corpus so you're really using this foundational tools that are broadly applied, towards achieving multiple tasks and doing so, you know, at a relatively high, you know, a high confidence and a high, you know, basically meeting the metrics for your modelling work. 

 
41:00 
So I think, you know, I would say, you know, this is really even from the point of view of like validating your model. 

 
41:05 
I think if you can, if one can say, OK, you know, our computer science and our statistics are this good at representing, you know, the world or the realm of molecules and doing so on all these different tasks. 

 
41:20 
I think we are well on our way towards really finding success in a way that is kind of methodical and not by chance. 

 
41:29 
Wow, there's a lot of interesting points you raised just then Sammy.  

 
41:34 
You raise the point of different representational schemes for knowledge, but also chemistry. 

 
41:40 
Then you simultaneously seem to use SMILES and graph structures for these things rather than use the best, whatever that one is. 

 
41:48 
What are the standard? 

 
41:50 
Which ones are you finding the best representation overall for your chemistry structures for these models? 

 
41:55 
Anyone. 

 
41:57 
I think really that fused approach is working really well. 

 
42:00 
And then I'll just add to that, you know, finding better ways of representing, you know, your SMILE. 

 
42:07 
So bringing in information about, you know, hybridization degrees, bringing in information about chirality, you know, that can be really important. 

 
42:16 
And in some cases, you know, if you encounter moments where some of the properties of your compounds are actually driven by, you know, some particular ratio, you know, of, you know, differently directed chiral molecules, you want to represent that information there too. 

 
42:30 
So having all of that used into one system that says, OK, this is now a molecular embedding code. 

 
42:38 
And you say that now anyone in the company can use this. 

 
42:41 
It becomes a kind of like a tool in your library, then you're really well on your way towards that. 

 
42:48 
I don't think one of those alone would work as well as the fusion of all of them. 

 
42:53 
And then I have to say here, you know, if someone is just using SMILES and getting what they need, of course we wouldn't stop them. 

 
42:57 
You know, if it works for you, OK, that's fantastic. 

 
43:00 
But make sure you just mark out, you know, what is the scope of applicability, and you know, make sure you don't exceed it. 

 
43:07 
Jason, does Amgen have a similar kind of strategy for the embedded features? 

 
43:12 
Yes. 

 
43:12 
I mean, it also kind of depends. 

 
43:14 
I think that these or the embedding features or the having as much molecular information as possible is certainly one strategy. 

 
43:24 
And then there's also things like keeping it simple, like I can't speak to Amgen's practises, but at least in my own personal view, you know, like I've had models perform well when we kind of focus more on general substructures, right? 

 
43:40 
Like MXenes, for instance, is just like encoding of some of these typically organic molecules, you know? 

 
43:47 
And in that sense, if we focus more on the macro chemical information, that sometimes is more useful. 

 
43:53 
But then in other cases, you really want to, as Sammy was saying, have all of the information possible. 

 
43:59 
And that's of course a more complete representation of your molecule. 

 
44:02 
And then you can leverage that for, you know, more, I guess, rigorous models or things that can pick up on this lower-level correlations. 

 
44:12 
I think also, though that's very context dependent, a molecule is a molecule and it is useful and not useful in various in certain tasks. 

 
44:24 
And so in isolation of a molecular representation might not be able to tell you every or might not be able to contain the information to tell you if a thing is good for this or that. 

 
44:38 
And so we also kind of have this we keep this in the back of our minds so that, you know, when you consider if your representation is really useful, you kind of always have to be asking this question of do I is there really the information or can I really pick or learn information from this sort of thing or from this representation? 

 
44:58 
So that's kind of my view on it at least. 

 
45:02 
Very interesting. 

 
45:02 
OK, thank you. 

 
45:04 
Anything to add Sung-Hun? 

 
45:06 
Yeah, my background is again the structural biology. 

 
45:10 
So I feel some gap between the physics of the compounds. 

 
45:16 
The compounds have 3-dimensional shape and it has multiple confirmations, different energy levels and also it's protonated in certain environment depending on the environment and also the stereochemistry. 

 
45:32 
All those things are important for activities and explain the activities in different targets. 

 
45:38 
So I'm wondering the how people are trying to bridge the gap between the physics and the representation of SMILES or the graph that maybe I don't think it's not fully enough to catch or capture those physical or the physical realities. 

 
46:05 
So anyone respond? 

 
46:12 
Yeah, go on, Jason. 

 
46:16 
Oh, OK. 

 
46:16 
Well, I was going to give more of an out of left field type of answer. 

 
46:25 
But I think like there's a lot of work in the physics-based community on developing descriptions based on the quantum mechanics like the Hamiltonian and how you describe sort of the distribution of electrons, right, and how and the interactions that your molecule might form. 

 
46:42 
So you can do a machine learning for these sorts of quantum mechanical based descriptors. 

 
46:47 
And then you can essentially use those matrices that you generate in order to solve the quantum mechanical equations. 

 
46:56 
And that's sort of like a bridge between machine learning and QM to get a very highly accurate, you would hope a description of your, for example, binding situation or some molecular properties. 

 
47:09 
So if you want to really if we're talking about physics and those sorts of descriptions and mathematical formulations, then there is effort in that area. 

 
47:21 
Although I would say that it's still not within the realm of, you know, being applied to industry today quite yet. 

 
47:31 
Yeah, my reaction is also going to be in that direction. 

 
47:33 
I was going to talk about quantum chemistry and really thinking about, you know, so for instance, there are some models that look at representing each molecule as an interaction between the medium of interest, right, which allows you to represent the formulation and then the surface charge on the molecule. 

 
47:56 
So you have you basically break up your molecular surface charge of the potential and into sort of like, kind of like tiling where you're now trying to represent interaction machine each one unit, each quantum. 

 
48:15 
And the, you know, the medium that it is in, as well as the interaction between, you know, that same unit and other quantum on the molecule itself. 

 
48:28 
And this kind of an approach can actually be very helpful in when you're looking at, I've used it in the past on formulation chemistry, even in cases where I needed to represent highly structured fluid, you know, where there might be some self assembly going on. 

 
48:42 
And there are kind of like pockets within that formulation that you need to represent. 

 
48:46 
And then you need to be able to also move very rapidly from those very, you know, sub nano scale structure. 

 
48:52 
And you want to move very quickly to the kind of the macro scale effect and you want to model those with effects actively. 

 
48:58 
Well, and here, you know, it might sound like the computational physics might like, might be a toll on the devices that you're using, but actually it was relatively computationally efficient. 

 
49:11 
It got up there. 

 
49:11 
And then in the end, of course, you can build surrogate models where you are really building end to end models. 

 
49:16 
Moving from a description of your formulation, you know, or at least critical parts of the formulation, as well as your, you know, your confirmations of interest, and then moving it all the way to, you know, the target variable that you want to use for your decision making. 

 
49:31 
And you know, I would say, you know, the success was good enough to drive decision making and do so with that. 

 
49:39 
Thanks, Sammy. 

 
49:41 
I can tell we have a very interdisciplinary and very talented team here that they've assembled. 

 
49:47 
But what is an actual, what kind of skill sets do you need in a chemistry team today? 

 
49:52 
Because it's quite different than it was, say, five years ago, I think. 

 
49:56 
Dimitar, I'm sure you have a perspective on this. 

 
49:58 
How would you build a team that does optimization? 

 
50:02 
Actually, the best solution for the team is to have crossing between AI and life science. 

 
50:11 
This is the reason that when I started my company over five years ago, this is only one company Bulgaria in life science, Micar21, and we decided to build ecosystems, and now we built before. 

 
50:26 
Four years ago Health and life sciences cluster disunited all the air life science companies in Bulgaria and biotech. 

 
50:33 
We are full member of the Euro Bio and Council of Europe Bureau region. 

 
50:38 
But in another side, when I want to get more innovation in my company, this the reason that over four years ago we started to build AI Cluster Bulgaria. 

 
50:52 
And this organisation where united all the AI specialists in Bulgaria and we crossing between life science and AI. 

 
51:01 
And this provided me a lot of opportunities for the new innovation and access for the talent and many other things. 

 
51:10 
And now we are member of the European AI Forum. 

 
51:13 
This united all the AI clusters in Europe and this provide me opportunities to make multi-disciplinary approach for the this field. 

 
51:26 
If yes, absolutely. 

 
51:29 
Thank you. 

 
51:30 
We've found the same thing. 

 
51:31 
We're have a computer science and chemistry groups seem to be merging together for different things. 

 
51:35 
Is that the same as your experience and other places? 

 
51:43 
Sung-Hun, how about your company? 

 
51:48 
We have a small team, like for example, I'm closely working with the medicinal chemist. 

 
51:56 
So then in that group, we have a dedicated person that do the online on the data. 

 
52:03 
So the is she has some background in chemistry, but also cheminformatics. 

 
52:11 
And so she interprets the available data as a readily understandable format and then the take that information into the next design or it's bring up the discussion for the new, the another type of bioaasays and those things and also computational chemistry. 

 
52:36 
We as Sammy and Jason mentioned that the AI/ML approach for using the update quantum mechanical calculations for the routinely for the portion angle scan, those things are now is easily available. 

 
52:58 
So those used to be very long calculation using TFT, but now it's a more rapid calculation. 

 
53:07 
So that information will be provided for the chemist. 

 
53:11 
So the computational chemistry and structure of biology and chemical medics is traditionally in the separate group, but now it's integrated in the medicinal chemistry so they work together. 

 
53:26 
So it's a now it's more integrated environment. 

 
53:34 
Fair enough. 

 
53:35 
One thing I've heard about a lot in the literature for AI is Federated learning, the idea that you can share not your data, but share models on your data with other people and collaborate. 

 
53:46 
I haven't heard anything like this for chemistry. 

 
53:49 
Is anyone else familiar with that? 

 
53:50 
For the chemistry space? 

 
53:58 
In one of the projects I'm working on, it's not exactly chemistry and it's more than chemical engineering, but even then, you know the chemistry that's actually the subject of discussion there and there's actually a pool for this kind of thing. 

 
54:14 
Federated learning, when we experience what we see is that the chemical space you're trying to cover is really large. 

 
54:23 
And yet the questions we have are very specific, but they keep shifting in terms of scope. 

 
54:29 
So we might start with molecule A, there might be some further redevelopment and then it changes or maybe the formulation we had before changes for one reason or the other. 

 
54:37 
So you cannot then handle that entire problem on your own. 

 
54:42 
And the resources we have are very  good, but then when we, if we have to cover the entire chemical space, they're insufficient. 

 
54:50 
So that's where, you know, Federated learning approach is really essential. 

 
54:53 
Now, practically speaking, you know, you have to talk to your lawyers and get the legal team to recognise the distinction between this kind of learning approach and, you know, stay at data transfer agreement. 

 
55:05 
You know, that's going to be really critical. 

 
55:06 
And then having IT join us in the entire efforts, you know, you're talking about people who work on IT, infrastructure, security, all of that. 

 
55:15 
You have to be, you have to have all of these people with you. 

 
55:19 
So the administrative aspects of taking this approach are huge, but the payoff we expect will be also huge. 

 
55:28 
There's no other way to cover huge chemical spaces that we want to. 

 
55:33 
And we know without the data, we kind of have the models. 

 
55:36 
And then on the other hand, once you have the model, you'd like to have some evidence that it is indeed applicable, you know, not universally, but you know, broadly. 

 
55:45 
Because if the model is too specific, then you know, you have to be worried all the time about the risk of making the wrong decision if you step moderately outside of a boundary, even if you step too close to the boundary of the model. 

 
55:57 
So I think they're for sure, you know, Federated learning is that is a wonderful approach. 

 
56:02 
I think there might be some, you know, specifically within the chemistry space. 

 
56:07 
I feel like, you know, I've probably encountered in the past, I think at least there was an EU wide effort to try and develop a workbench for chemists. 

 
56:18 
And here what is published on that site, you know, you can upload data if your data is, you know, publicly funded, you know, this would be something that even would add new accolade, you know, if you share it. 

 
56:29 
And then there was also a library of models that were used. 

 
56:33 
And so you'd have a typical, you know, well worn path, you know, in computational chemistry, and then like you'd have those event like docking models, you'd have those and then you'd have these more modern ones. 

 
56:44 
So if your model ends up taking, you know, like, you know, getting the five star kind of rating, then you know, that gives you some more confidence in the work you're doing and in the applicability model. 

 
56:56 
Of course, in a private in the private sector, you'd have a bit more of a challenge. 

 
57:00 
But still, I think if we hash it out with our legal colleagues and really communicate the value of it versus the risk and show that the value is much greater than the risk, then I think we'd be well on our way to maximising the utility of this kind of learning approach. 

 
57:14 
Great, thank you Sammy. 

 
57:16 
So I think that's a way going forward. 

 
57:18 
But as we finish up here, what should we be excited about next? 

 
57:23 
What's the next opportunity that we should be thinking about in terms of chemistry and AI that has you excited? 

 
57:30 
Jason, what do you think about that? 

 
57:34 
Well, I mean, just from this discussion, I think what soon someone touched on was the generalisation from ligand-based drug discovery to structure-based as well. 

 
57:47 
So I mean, I don't know just what he was saying got me excited already about some of the ways in which we can leverage that additional structural information in order to I guess make more meaningful predictions or even just suggestions to chemists to think things such as this. 

 
58:03 
Because you know, you one thing that I've noticed is like chemists can really do really well at coming up with lots of easily synthesizable or easily tested ideas. 

 
58:15 
And then, you know, 2D space when you're looking at structures and you're drawing them and trying to figure out the reactions necessary to create them. 

 
58:22 
But when you're looking at 3D space, and if you have these, if you're looking at these pockets, sometimes a lot of the suggestions are like, well, you know, it's not going to fit in here. 

 
58:31 
Or that's a lot of what the work that I do is working with them on how best to filter the ideas and things like that. 

 
58:39 
And so I don't know, I think I can definitely see more applicability for AI in that regard. 

 
58:45 
Where we have models in order to take that 3D information and assess whether or not some of these ideas will work or even just generate them ourselves and perhaps have a feedback loop where it's then passed to chemist for review and vice versa. 

 
59:04 
So I don't know I think just yeah, based on the conversation we had today that would be a very interesting thing that I'd like to follow up on myself. 

 
59:12 
Thanks, Jason. 

 
59:13 
That ties us nicely back into where we were before at the beginning. 

 
59:15 
So any other ideas? 

 
59:19 
Sung-Hun I think you're about to say something. 

 
59:21 
No, I'm also very excited about the recent progress in the quantum mechanical calculations to the machine learning so that we can achieve the similar accuracies much faster. 

 
59:38 
So I think that bridges the gap between the MM and QM. 

 
59:45 
Also the QM, this is the AIML approaches. 

 
59:48 
So the the AIML model around the chemistry might be much more accurate and much more representative than before, just without using just the descriptors and those things substructures. 

 
1:00:05 
So I think there is a huge potential to improve the accuracy with the speed. 

 
1:00:11 
So that's what I'm very excited about and also that progress will be applied to the structure based de novo design as well. 

 
1:00:21 
So I think that it is not very specific to the certain target or certain domain, but generally applicable from the broad data set of the quantum mechanical data set. 

 
1:00:35 
So I think it's really feasible. 

 
1:00:40 
Great. 

 
1:00:41 
Well, thank you very much. 

 
1:00:42 
Those are great things to look forward to. 

 
1:00:44 
And I think we've, we squandered the entire hour, but I learned a lot in the process. 

 
1:00:48 
So thank you very much for your contributions and ideas and thank Oxford Global for hosting this. 

 
1:00:54 
I think I'll pass it back to them. 

 
1:00:56 
Thank you very much.