0:00 

So I want to talk today about how to maximise the potential of AI in biologics discovery. 

 
0:07 
But first, if you'll allow me, I'll briefly explain about our company. 

 
0:11 
ENPICOM was founded 10 years ago in the Netherlands. 

 
0:15 
And we have basically 2 main pillars, our software for biologics discovery, but also our team with diverse expertise in immunology and computer science, in ML science and software development to be able to serve our clients. 

 
0:30 
And our partners and clients benefit from several aspects from our platform. 

 
0:39 
First, the capacity to make faster and smarter data-driven decisions by giving scientists the autonomy to visualise their own data and select the best leads in an intuitive environment. 

 
0:56 
Second, the capacity to maximise the return on investment on AI by being the first platform that caters to the needs of ML scientists in biologics discovery. 

 
1:06 
And last but not least, by freeing time from computational biologies and data analysis in companies that are usually stretched thin, especially in large organisations doing redundant tasks, helping their peers or maintaining legacy software that takes most of their time. 

 
1:26 
For today, I want to start by talking about the growth of AI in biologics discovery. 

 
1:32 
And I'm going to borrow a bit from a report from the Boston Consulting Group that looks at some of the trends to then talk about the challenges that still exist as a barrier for that discovery, for that usage. 

 
1:45 
And then I'll finalise with the work that we've been doing at ENPICOM to remove those barriers and maximise that synergy. 

 
1:56 
The report looks at basically what's been happening in the field. 

 
2:00 
And I think you'll all agree that even in a conference like these, you can see that there's been a change over the last five years. 

 
2:06 
With more and more talks about AI and maybe at some point a bit too enthusiastic some of them. 

 
2:13 
But definitely it's not going away and it's changing a lot of what we do on a day-to-day. 

 
2:19 
And it shows that over the last five years there's been a 17% yearly increase in patents and 34% in publications. 

 
2:28 
And this interest is being backed with a lot of investments. 

 
2:32 
And you can see giants like Google or Amazon partnering up with Big Pharma trying to capitalise on this potential for transformation. 

 
2:43 
And ChatGPT  also agrees. 

 
2:45 
So I'm sure it's going to be huge. 

 
2:50 
But in all seriousness, this is not just a big investment and patterns about it. 

 
2:55 
If you actually ask people working in the field, how much do you think is going to change their day-to-day? 

 
3:02 
Both current AI users and non-users agreed that it's going to transform the way they're doing their research on a day-to-day. 

 
3:09 
But then if you ask them how often are you using AI, what we see is that despite that interest, the level of adoption is not that high. 

 
3:18 
And this difference between a high interest and a low adoption points towards a major shift in the next coming years. 

 
3:28 
And these models are being generated on a cycle where you produce some biologicals, you characterise them, and then you learn from that characterization to produce new biologicals. 

 
3:39 
And companies that we're in conversations with tell us that they expect this cycle to get faster and faster, 25% faster over the next five years and reduce 50% the amount of biologics that need to be expressed to learn from them. 

 
3:58 
But there are some barriers to that adoption. 

 
4:01 
First of all, as we are all aware, AI is data hungry, but it's not sufficient to have large amounts of data. 

 
4:08 
The data has to be good quality and accessible. 

 
4:11 
Those are essential factors. 

 
4:13 
If your organisation produces a lot of data but it's not homogenised, it's scattered all over the place, you won't be able to use it and to train models and you won't be able to use the models on that data either. 

 
4:26 
Then we have expertise and capabilities and it's quite tricky to have the right expertise in house to develop these models. 

 
4:33 
We heard a bit about it yesterday, but competing with again, big tech on salaries or benefits might be tricky. 

 
4:41 
So having the being able to attract them from the market is hard. 

 
4:44 
And then within the organisation, as I mentioned before, sometimes they're spread thin with redundant task or all our duties that not necessarily align with training new models or algorithms. 

 
4:59 
And then finally, there's tools because this cycle that we just saw requires a specific tool set to be able to track, to be able to visualise, compare models and deploy them back in production. 

 
5:15 
And at ENPICOM, we've been trying to tackle each one of these barriers to reduce the friction for adoption of AI. 

 
5:22 
When it comes to data, we have created a highly scalable data infrastructure that facilitates ingestion, storage and access. 

 
5:30 
When it comes to expertise, we have a team that we've been growing over 10 years that is specialised in biologics because it's not the same to make a machine learning model for a ChatGPT or for a fintech, it is different to make a model for biologics. 

 
5:44 
And we have immunologists like myself, but also computational ideologies, data scientists, software engineers and ML scientists. 

 
5:54 
And then for tools, last year, we released our newest platform that includes an environment specifically designed for ML scientists to track their models and be able to quickly deploy them back for usage. 

 
6:11 
And by discussing with different organisations and helping them in the route to AI adoption, we've also identified a three-step blueprint on how to get from your current non AI discovery to a successful integration on your discovery. 

 
6:32 
Step number one is data foundation. 

 
6:34 
And this is because working on the Solid Data Foundation enables you to have all the data available for training the models and using the models step number two is pipeline automation and consolidation. 

 
6:47 
And this is important because non-consolidated data will be different, will not be homogenised, will have disparities and non-automated data will create silos. 

 
6:59 
Meaning your data will end up in a different database or in a folder in an excel and not be accessible. 

 
7:07 
And last but not least is AI adoption. 

 
7:09 
And this includes not only creating new models, but also managing the model life cycle and being able to deploy those models back in discovery because it's a two-way street. 

 
7:21 
We hear a lot about how to make the best environment for training models, how to create the best models, which models have been published. 

 
7:29 
But we don't hear much about how to make sure that you test that model as soon as possible in a real campaign. 

 
7:34 
In the panel that our CEO was discussing yesterday, they specifically talk about this, how some of these models that look perfect on paper, when you test them in a real campaign, they fall apart. 

 
7:47 
So you want to be able to go back into a real campaign, into a real everyday use as quick as possible. 

 
7:57 
And we've been creating our platform to be able to cater to each one of the steps for the data Foundation. 

 
8:05 
We have an incredibly scalable data infrastructure that can house all your data and serve as a solid foundation. 

 
8:15 
This database can of course be integrated with other systems in your organisation, which means that you will be able to consolidate or automate certain processes and you will avoid silos or data being stuck somewhere. 

 
8:32 
All of this is accessible through a very user-friendly UI, which means the lab scientists will be happy to be able to analyse their data without needing coding, without recurring support to be able to do any analysis. 

 
8:47 
And on top of that, we have added the capacity to train models and track their properties and deploy them quickly into production, into usage for everyday scientists. 

 
9:08 
Now if we look at each one of these different steps for the data foundation a bit more in depth, we have a database that is highly scalable. 

 
9:16 
It can house billions of clones. 

 
9:18 
It scales to petabyte level, meaning that you can have all your historical data as well as all sequencing data from large sequencers like NovaSeq, NextSeq, all in one environment. 

 
9:31 
Performance as well in Query is fantastic. 

 
9:35 
We have not seen a drop in performance in hundreds of millions of clones. 

 
9:38 
It takes seconds literally and it can be integrated into all the systems like data lakes or other federated architectures. 

 
9:49 
So basically it can be an entry point and a connection point to all your internal systems. 

 
9:58 
When it comes to pipeline automation and consolidation, we have years of experience of working with pharma and biotech to create specific consolidations for their methods. 

 
10:09 
And we can use our own tools to do part of the analysis, to do certain steps of the analysis. 

 
10:15 
But we can also combine with our internal tools either in a mix and match where we're using some of our tools and some of their tools, or on a complete automation of other tools using our database or not. 

 
10:30 
We basically recognise that trying to say, oh, we have the software that you need, that's it, you'll need nothing else. 

 
10:36 
It's futile because every organisation will have different specific tools, different quirks, different requirements that we need to adjust to. 

 
10:46 
But when it comes to our tools, we do have quite a few tools. We have created over the years a lot of the most common analysis that you need for a selection of clones, for identifying some of the problems that your clones or leads might have. 

 
11:00 
It includes of course, gene annotation, quality control, repertoire. 

 
11:05 
Also more specific analysis for enrichment if you do in vitro or for clustering or phylogeny if you want to look at in vivo models. 

 
11:16 
And then we also have our own machine learning models to be able to predict humanness or to be able to predict exposed liabilities in the surface. 

 
11:24 
So you can early on minimise these potential challenges. 

 
11:30 
This is an example of our sequence annotation tool, which we created from scratch, which is fast and accurate. 

 
11:37 
It can process any data size, and in fact it works 10 times faster than some of the most used tools in the industry. 

 
11:48 
And this is not speed for speed sake. 

 
11:49 
It's not about who can process the data faster, but that also carries on lower computational costs. 

 
11:58 
It means you need to engage the servers for less time to be able to annotate your clones. 

 
12:02 
What you see on the left is an example of a service campaign that we did with a partner where we basically were processing at 1.15 million reads per minute. 

 
12:14 
So it really scales really fast and parallelizes to any data size. 

 
12:20 
And all of this with a visual interface, where as I was saying before, you don't need any coding knowledge to be able to analyse your data. 

 
12:28 
You see on the right an example of our 3D structure visualizer, but also on the left how you can easily add information, in this case predicted liabilities, to be able to select potential clones in a cluster of interest. 

 
12:45 
This is a bit redundant because now the left is showing the same thing as in the right, but you can navigate easily across multiple environments and not have to see any code at any time. 

 
13:00 
But we have, of course, an API and SDK as a complement to this, which means that if you're analysing your data in our platform, any data science is able to access that data, query it, and integrate it into a system, as I said before, any ecosystem that you have internally. 

 
13:20 
And now if we go to step number 3, the cycle that I mentioned before on creation of the models does require any new tools for tracking, for deployment, for evaluation of the models and a new stakeholder, the machine learning scientist. 

 
13:34 
And this is exactly the environment that we've developed last year to be able to serve the need of the ML scientists to track and evaluate their models, register them and be able to decide basically which models are ready for prime time, for being able to use in a campaign. 

 
13:53 
This environment also comes with the capacity to deploy. 

 
13:57 
This can be with us or not, but we have a quick deployment method that will allow for these models to be accessible either through an API or in our platform. 

 
14:08 
And it includes model abstraction. 

 
14:11 
And model abstraction will take care of specifying what is the intent of the model, what kind of data you're generating, and how is it to be used and what is the input. 

 
14:23 
And that's important because once you've created that model abstraction, anyone can run it with a click on an environment, or you can put it as part of a pipeline. 

 
14:32 
By default, all clones need to press through this model to get the information. 

 
14:39 
And this is nice, again, not just because you can use it, but because it accelerates that loop of learning. 

 
14:45 
You go from a model that works to a model that is tested in a real scenario to seeing how can we improve that model or how can we add to it. 

 
14:57 
But I thought it would be interesting to show a specific example of how we do this. 

 
15:01 
So we took with the team paper that was published late last year, and this paper describes AppMap is basically an embedding, a transformation onto your sequence that is focused on the CDRs. 

 
15:14 
And then that embedding can be used to train, for example, on binding or to train on repertoire properties or properties of the antibody itself. 

 
15:23 
Now we took that model and in less than a week we train it against new data, we saw all the different modalities, parameters, we selected the one that run the best and deployed it to be able to use it. 

 
15:38 
The data that we use actually was a study that was part of during my PhD. 

 
15:44 
But basically the only thing that is important to know is we have known binders based on FACS sort some 700 of them and we have pre-immunisation samples that can be our negative. 

 
15:55 
And then what we have to do is embed all the antibody sequences and serve these two types of data negative and positive to be able to train. 

 
16:04 
Can we find something in the embeddings that will allow us to recognise binders versus non binders? 

 
16:13 
The training itself, it's quite easy no matter the model training framework that you use. 

 
16:18 
With a small snippet of code, without changing much of your workflow, you can organise the runs and you can track all the metrics that will put the metrics in our ML flow environment and then you'll be able to visualise all the rounds. 

 
16:33 
These rounds are different parameters or different potential frameworks used for model training and you can easily sort them and visualise them, for example, in a graph. 

 
16:48 
It's one of the little squares because it had to be explained to me, I'm an immunologist. 

 
16:52 
It is different metrics of performance of the model. 

 
16:59 
And then you can group as well. 

 
17:00 
For example, because the input might be different, some of them might have only CDR3, some of them I have CDR123 or the whole sequence. 

 
17:08 
And you can group them by that input to be able to see was it better to use the full BDJ or just a section of the sequence. 

 
17:17 
If you select a run, you can see the metrics of that run which allow you to determine how successful the model is. 

 
17:27 
But also any visualisation generated, any packages that are needed to be able to deploy this model. 

 
17:36 
And then the last step, if this model is successful enough, is if it seems that has a predictive value, then you can easily register it to deploy it. 

 
17:47 
Once it's registered, we have designed an environment to be able to deploy it very easily. 

 
17:53 
You give the model a name, you choose the register models that you have, and then you map the input. 

 
18:00 
The input is the data in the platform, the clones that it has information on, the different regions, which one is going to be the input. 

 
18:08 
And the intent is what I was mentioning before is what's the output going to be attached to? 

 
18:14 
Is it going to be a clustering algorithm or is it going to be metadata to be able to use in the selection? 

 
18:20 
And then you can deploy that model. 

 
18:22 
And once it's deployed, it becomes available not only through pipelines as I was mentioning before, but also throughout our UI, meaning that you can select some clones of interest and go to random models, see the models that are available, in this case, HIV binding model that we were talking about. 

 
18:45 
It will show you the input and the intent. 

 
18:47 
What is it going to generate? 

 
18:50 
And then you'll be able to send that model to a server. 

 
18:53 
Basically, send your clones to be annotated by the model in a server. 

 
18:59 
And basically it will create a classification. 

 
19:02 
In this case, it either doesn't bind, it has a zero, or it binds it has a one. 

 
19:07 
And when that information returns, that's in the database, which means is accessible throughout the analysis. 

 
19:13 
I'm showing you one environment, but it would be available everywhere. 

 
19:18 
And then in this case, we coincidentally enough, we found that one of them bind versus the others that don't. 

 
19:24 
And you can use that information to select the clones of interest. 

 
19:28 
So the first clone probably something you would want to express to test how it binds in the lab. 

 
19:33 
And then if it doesn't work, at least you know you've tested that model you've seen in real life. 

 
19:38 
Did it help me in my discovery campaign? 

 
19:43 
And with that, I'll like to summarise. 

 
19:46 
Basically, I hope that I've shown you some good evidence that AI is here to stay and there's growth, but there's barriers that need to be surmounted and that at ENPICOM, sorry, at that. 

 
19:57 
At ENPICOM, we've been working hard to try to facilitate this adoption and thank you for your attention.