0:02
Good afternoon so my name is Purvi Gupta.
0:06
I'm an application scientist at chemical computing group and I'm going to be doing the workshop today on antibody modelling and protein engineering.
0:14
So I'm going to walk you through the software and justice like the most basic tools that we have specific to the antibodies, but more also that you can use for other protein molecules.
0:24
So like the basic biologics tools.
0:27
Can I just quickly, so I'm Barbara.
0:29
I'm yeah, I'm just going to be sitting around.
0:35
But I do have a little made-up S&D list for you, if you don't mind.
0:40
If you don't mind, would you put your name down for me please?
0:43
That you have attended this wonderful workshop.
0:46
That would be great.
0:46
Thank you so much.
0:47
I'm going to give it a round and Barbara is also here to answer any of the questions you might have.
0:55
So in between, if you have any questions, just feel free to interrupt.
0:58
We can make it as interactive as possible.
1:06
So before I start with the actual workshop, so I'm just gonna give a little introduction about the software tool that I'm using.
1:14
Also our company offers, we have an integrated platform called MO, you can call it MO E, it's up to you.
1:20
It's short for molecular Operating environment, and it's a common platform for all research platforms, all research groups.
1:27
So from chemists to biologists, crystallographers and also medicinal chemists.
1:32
And we cover a broad range of applications.
1:34
So you have small molecules from peptides to the larger molecules, biologics, which I'll be showing today.
1:40
We have an open architecture, so you can run it on any operating system, maybe Linux, Windows or even Mac.
1:48
And then we also have a platform for running high throughput calculations.
1:52
Let's say you want to run on your cluster or a cloud environment.
1:56
Now the interesting part that comes with more is the scientific support and training services, which is completely free of charge with the licence.
2:03
So we do any on site or even offline as well as online training sessions.
2:09
And we can also do like one-on-one consultations if you have any questions.
2:14
So that's the east e-mail address.
2:16
If you don't remember anything by the end of the workshop, you can definitely remember the East e-mail address which is support at chemcomcom and you can E e-mail us at any time for any questions.
2:25
Now just at the bottom is some of the research areas as in the workflows that we go through.
2:30
So you have the structure based drug design which is more for the small molecules.
2:34
It's also ligand based design, virtual screening, chem informatics, QSR studies, and we have the biologics applications, which also includes the peptides and also molecular assimilations.
2:46
Apart from that, we also provide customization.
2:49
So if you have any kind of a request, we can integrate that into the software S we have all of the customizations available and we also do the deployment for industries.
2:59
Today, I'm going to be focusing on just the biologics.
3:02
So let me just move ahead, divided the session into three parts.
3:08
So the first part here is just engineering and affinity modelling where the basic idea is we do start with the crystal structure.
3:16
So you do have the crystal structure present and then we go through the analysis, whatever tools we can use and more to analyse the structures and then also do mutagenesis.
3:27
So virtual mutagenesis using a protein design application, The B and the C sections here are more based when you do not have the structure information present and you just start with a sequence.
3:38
So it's the homology modelling.
3:39
I just go through simple homology modelling for a single sequence and then also at the end by using a multiple sequences.
3:46
So like a batch homology modelling give you a little background of the system that I'm going to show today.
3:56
So he's an interleukin receptor which is a very known target for asthma and allergic diseases.
4:02
And there is a monoclonal antibody known which is an inhibitor for this particular interleukin receptor.
4:09
Now the problem with this antibody is that it's has poor solubility.
4:14
So our idea behind the whole analysis for the first section that I'm gonna show is to identify why this antibody has a poor solubility and how possibly can we improve the developability of this antibody.
4:29
So I'm just, I'm just gonna switch to our software platform.
4:34
So that's the basic window that opens when you launch.
4:37
More on the top, I'll just give a brief introduction.
4:41
So on the top we have the menu bar with under protein, all the biologics applications and the compute.
4:47
You'll find other advanced applications also for small molecules, conformational analysis, generations, docking, pharmacophores and also simulation.
4:57
Then on the bottom are just some information about the force field, some rendering options for atoms and contacts.
5:03
And on the right hand side, we have this button bar with the quick access buttons for the most used applications.
5:11
So like let's say you are doing a structure based drug design.
5:14
So you kind of go from top to bottom in this right hand side button bar.
5:20
Now I'm just going to start with the first step.
5:22
So for analysis, as I said, we are starting with the crystal structure.
5:26
So the very first step here is just loading in the crystal structure.
5:29
I'll just go ahead and navigate to a more sample move.
5:34
So it's the PDB ID 3G60 and I'm just gonna open it now.
5:40
When you load open a PDB file in more, it gives us this load PDB file panel where we can choose the data or the symmetry that we would like to load it.
5:49
For example, it might contain multiple copies or you just want to see the biologically significant assembly.
5:54
Let's say a dimer for some of the protein molecules could choose that what we want to load in more.
6:00
And there are also some extra options for more specific to crystallographers.
6:04
Let's say if you want to show some alternate positions for the side chains, ignore waters depending on the starting structure that you're using.
6:12
In this case, there is only one copy of the complex in the PDB file, which I already know, so I'm just going to go with the defaults and load it in.
6:21
So now here if I rotate, you can see there's the antibody on the left side and then you have the antigen molecule.
6:27
By default we colour the structure by the secondary structure elements.
6:33
So you will see the beta sheets in yellow, the helices in red and the turns and the loop in blue and white.
6:39
Now this is just a raw structure file, so raw PDB which does not contain any partial charges or hydrogens.
6:46
So before we start any analysis, we need to make sure that our structure is fully prepared if there is any missing atoms and also we add the charges.
6:55
So I am going to go ahead and prepare just from the right hand side button bar.
6:59
I go to quick prep and you get this panel.
7:01
So we have 3 sections in this panel.
7:03
On the top is just for preparation and cleanup of the structure.
7:08
In the middle section, we applied tethers and restraints.
7:11
And finally, at the end is refinement or energy minimization.
7:16
Now the defaults here are set for small molecules.
7:18
So just still toggle this option off because as you can see, it says fixed atoms further than 8 angstroms from ligands.
7:25
In this case, we do not have any ligand molecule present.
7:28
It's more valid when you have a protein ligand complex.
7:30
So you just want to focus on the binding side, the rest of the protein further away from the binding side, it's not having any direct impact.
7:38
So we just keep it fixed just in interest to save the computational time.
7:42
But for biologics, we just work on the whole molecule.
7:45
So I turn it off and just click OK on the left side.
7:50
You can see it started and it gives the progress of each step.
7:53
So how it's performed for the preparation, I'll just go back to the panel quickly and go through the other options.
8:00
So as I said in the first section is structure preparation and clean up.
8:04
So we use two functions.
8:05
One is the structure preparation function which ensures that you have a complete model at the end will add any missing atoms, residues or even loops.
8:14
Up to 10 residues will be modelled in using the structure preparation.
8:18
If you have loops missing more than 10 residues, then we have separate applications for modelling in those loops.
8:25
Then you have the clean up of the structure so you delete the water molecules and also you have forgot to mention the second function which is protonate 3D.
8:33
So that will take care of the ionisation States and it will also optimise the hydrogen bonding network in the system.
8:40
In the second section is tethers and restrains.
8:42
So we tether the receptor atoms so that we stay as close to the crystal structure geometry as possible because during minimization we do not want to end up with some absurd confirmations of our protein.
8:55
And the last is refinement, which is energy minimization.
8:58
And this will ensure all the potential clashes are removed from the system.
9:05
So now the message has disappeared.
9:06
That means my structure is prepared.
9:08
I'll just close and go ahead.
9:11
So we started in by loading in the structure and we prepared it.
9:14
So now we are ready for the analysis.
9:17
Just one more thing before I do that since this is an antibody antigen complex.
9:21
So I'll just make my life easier by annotating it.
9:24
So I have all the different domains.
9:26
If I just go to protein annotate and antibody.
9:30
So you can see the numbering schemes that we have available for justice annotating antibodies.
9:35
I'm going to go with the MGT today, most commonly used and you can see instantly you have the colouring on the antibody, but not for the antigen because the annotation only works for the antibody.
9:46
We have the light blue and the light green colours for the light chain, the darker colours for the heavy chain, and you will also see some of the different colouring as in purple, red and orange and that's specific to the CDR's.
10:00
I'll also open the sequence editor.
10:02
So that's the second major window and more after the 3D view that we were just seeing.
10:07
The only difference is here we see the one-dimensional amino acid sequences.
10:11
And in this case, if I make it like this, you will see there are three sections.
10:15
So when I annotated the structure, it creates these three sections for all subunits as we call in more for all the chains.
10:23
So with antibodies, the first subunit is always a light chain, then in the second subunity of the heavy chain and the third one contains everything else in the structure.
10:32
So in this case, it's just the antigen molecule.
10:35
So that's the the chain in the third subunit.
10:39
Now I'll give a brief introduction about the sequence editor as well.
10:42
Now you're seeing the three letter codes, but I could also just condense the view.
10:47
So I have the one letter codes now and I'm also going to wrap the view.
10:51
So it just makes it easy for me to scroll.
10:53
That's a personal preference.
10:54
You can work as you like.
10:56
You will also see the colouring is extended in the sequence editor.
10:59
So for the CDR's, on the light chain you have the purple and on the heavy chain it's in orange and red as we were seeing in the 3D view.
11:08
So also when I annotated, there's another thing for the system manager, we create these sets.
11:14
So a little about the system manager, it's just a tool which helps us control everything that we have loaded in the move window.
11:23
So if I look at the system manager from left to right, I can just make selections by using this radio button.
11:30
Then you have the A in green.
11:32
The A here refers to active.
11:35
Let's say you have many complexes or many PDBS open at the same time.
11:39
So you could choose where you want to run your calculations.
11:42
I can just set it as inactive.
11:44
So now the energy minimization or any of the calculations that you run will only run on the active groups in the main move window.
11:52
In this case, I'm just gonna toggle that on.
11:54
Then you have the receptor line.
11:56
I can just click on it to hide or show the atom groups and on the right side with the square boxes are some rendering options for the colour, the ribbons and also some quick access buttons for creating surfaces for the proteins.
12:11
Now here with the annotation, we create these sets, so it just makes it easy to use in more.
12:17
Let's say I can just look at the Cdr HP directly from here.
12:21
I want to and also another benefit of sets is that they appear in every panel in most.
12:27
So if I'm doing any contacts or property calculations, I could just choose which set I would like to work on rather than going and selecting all the residues again and again.
12:39
So we have the annotation done now.
12:41
I'm just going to go ahead and start with the analysis of this crystal structure.
12:46
First thing that we'll look at is the interactions between these two proteins, so how the antibody and the antigen is interacting at the interface.
12:56
I'm also gonna colour this antigen so we have a uniform colour over the whole system.
13:03
And for the contacts, I can just go to protein contacts.
13:08
So that's the protein contacts on the top you just said the atom group between which we want to see the contacts.
13:14
Since I annotated and more knows now this is an antibody antigen complex.
13:18
So by default you will see on the top it's antibody and antigen.
13:22
In the middle we get the whole of the contact list and at the bottom is some of the filtering options.
13:29
Now I have the antibody antigen set.
13:31
All I need to do is hit search and it will give me the whole contact list.
13:35
Just go from left to right of the information that we see in the contact list.
13:40
The first field gives us the type.
13:42
So IH here refers to ionic and hydrogen bond interactions.
13:47
It's the first letter of all the interactions that we are seeing on the bottom here.
13:51
So it can be distance, wonderwalls, covalent arene, ionic metal interactions if there is any, and a hydrogen bonds when you have the residues of the chains interacting.
14:01
So it's the set A.
14:02
So the glutamine residue on chain A, which is our antibody interacting with the Ester gene on the antigen.
14:11
When you see the energy of each of these interactions in kilo Cal per mole and the distance between the two residues interacting in angstrom, there is some more information about the more specific type of interactions.
14:23
So here for example, in the BB field, the small B refers to the backbone.
14:28
That means it is the backbone of set A interacting with the backbone of set B.
14:34
If you see a dash, that means it's a side chain.
14:37
And if there's a *, that means both the backbone and the side chain atoms of a particular residue are involved in the interactions.
14:45
Now one more thing.
14:46
All of these interactions in the contact list that we are seeing now are aggregated by residue.
14:51
So at the bottom you can see.
14:53
So it aggregates all the interactions that a particular residue forms with the other set.
14:58
Then you also have the frequency which tells us how many interactions they are forming and the area of these interactions.
15:06
Now I can sort by any of them.
15:07
So let's say for this particular analysis, we are looking to improve the developability of our antibody.
15:14
So with the antigen now here, it can be two cases.
15:18
You might also be looking for affinity maturation.
15:21
Then you would just go ahead and look at some of the weaker interactions and try to pick those residues that we can possibly mutate.
15:27
Or if you're in this case, like I'm looking for developability, I can also focus on the strong interactions and try to preserve those interactions so that at least the affinity is maintained and we also get a more better developable antibody.
15:42
So I'll just sort it by energy here.
15:44
And the more negative, the better it is.
15:47
So we have the most stronger interactions on the top.
15:51
Now the distance interactions are toggled off by default because there's quite many of those.
15:56
But when it comes to protein, protein, they can be quite important because there's lot of hydrophobic interactions that occur at the protein protein interface.
16:04
So I just toggle them on and you can see the list increase suddenly a lot.
16:09
So now I have already sorted by energy, I can just pick a few interactions.
16:12
So let's say I just look at the one from the top, they're both forming an interaction with arginine.
16:17
So we have some aspartate residues.
16:20
And if I just isolate and centre them so I can see them in the more window.
16:24
So we have the residues interacting for now.
16:28
You're not seeing very, you might not notice there's very faded dotted line.
16:35
But if I just toggle on the receptor, receptor interactions as well.
16:38
Now you will see them in more details.
16:40
So let me just zoom in.
16:42
So you have this blue dotted lines with cylinders which are indicating the hydrogen bonds.
16:47
Now the cylinder here is quite significant because it tells us about the strength of the interaction.
16:52
The longer the cylinder, the stronger the hydrogen bond is.
16:56
In this case, I can see both of these hydrogen bonds on the ends are quite strong as opposed to the one that you see in the middle.
17:05
Also the ones on the other side with the other as parted.
17:07
These are quite strong hydrogen bonds.
17:10
If we want more information, if I just go to the contacts and click on this button.
17:15
So now it gives me the energy in kilo calliper mole for all of these interactions individually.
17:21
We can also look at the distances between the atoms in angstroms or just for making images.
17:26
Sometimes people like the plain dotted line representation.
17:30
I'll switch back to the one which is most informative so we can see how strong an interaction is.
17:36
Now that's just the two strong interactions from the list.
17:39
We'll keep a note of that when we are doing the mutations later so that we don't disturb the strong interactions and still maintain the affinity with the antigen.
17:50
Just a general overview of the list.
17:51
Just if you look at the contact list here, you can see there are quite some strong ionic bonds and salt bridges being formed between the antibody antigen with the aspartate.
18:01
Then you also have arginine and lysine on the antigen side.
18:05
But you'll also notice there are quite some hydrophobic interactions, which are strong hydrophobic interactions between these two proteins.
18:13
For example, there's the methionine, phenylalanine that's interacting with the leucine residues on the antigen side.
18:22
So this time instead of let me just clear the selection, so we looked at the strong interactions.
18:28
I am going to sort it by frequency this time just to see which of the residues are forming the most interactions.
18:34
If I see on the top you have this leucine 100, so I can just click on it, I select and centre this as well.
18:40
So these are the leucine interacting with the arginine and on the top, so also some phenylalanine.
18:46
Just look at these as well.
18:48
This is the phenylalanine, the submarine interactions also being formed.
18:53
So this is just an overview of how these two proteins are interacting.
18:57
At the end, it depends on your aim, what you are looking for.
19:00
So I will keep a note of the stronger interactions since in this case I am just looking to improve the developability of this protein.
19:08
At the bottom there's some filtering options.
19:10
So if for example, I just want to look at the interactions with a particular arginine, I can just philtre down the whole list to see, OK, this is the interactions with my arginine.
19:20
I'll just cancel that if I want to show the surface between the two proteins.
19:27
So it shows me the whole interaction surface between this antibody and the antigen and hide the surface for now.
19:35
And let's move ahead.
19:36
So we looked at the contacts.
19:38
We'll continue the analysis.
19:39
So we know we have an idea of how these two proteins interact.
19:43
Now I'm only interested in the antibody.
19:46
So for now, you see only one receptor line.
19:49
You might be thinking it's we do have three chains.
19:52
So I switched to an extended view here.
19:54
So now you'll see three chains, the light chain, heavy chain and the antigen.
19:58
And I'll just go ahead and hide the antigen because I'm more interested in looking at the properties of the antibody.
20:05
We'll continue the analysis and let's look at the surface of how this antibody has a protein surface on the outside.
20:13
I just go to the surface and protein.
20:16
So it will create a constant coloured surface, which gives me an idea of the shape of the protein.
20:22
But I could also look at other parameters.
20:25
Let's say for example, I would like to look at the electrostatic distribution over the surface.
20:30
So if I go to electrostatics now you see this blue and red patches on the surface.
20:37
So the blue is for the positively charged regions and the red for the negatively charged.
20:42
Now in this case what we are interested in is more at the interface.
20:46
So if I bring the antigen, so you have the interface right here.
20:52
So we have the interface right here.
20:53
I'm going to hide it again.
20:55
So just how the surface looks at the interface of these two proteins.
20:59
So here I can see there's quite some negatively charged here.
21:02
Now in this case, I'm more focused on the solubility.
21:05
But if you are looking for the developability in general, this might be a region if you want to focus on because more negatively charged reasons leads to viscosity in the proteins.
21:15
So that's just one area that you can focus on.
21:18
If I look more around the interface, I can see quite some white and white regions around here.
21:25
But this still doesn't give me a complete idea of it's like polar of it's positively charged or if it's neutral.
21:33
So instead of electrostatics, I will go ahead and look at lipophilicity of the surface.
21:39
Here the colouring goes from purple to green.
21:41
So purple is for polar and green is for greasy or hydrophobic residues.
21:46
Now here it gives me more information about the region where I was seeing white with some blue spots.
21:52
So you can see it's quite green around here.
21:55
If I bring up the antigen, you can see at the interface where antigen is binding with the antibody, it's quite green at the interface.
22:04
So that's quite hydrophobic.
22:06
Now this already gives me some idea why this antibody might have poor solubility because of the aggregation propensity when there's a lot of hydrophobic residues present at the surface of the protein.
22:19
But yeah, so you're showing this in static structure and how the charges, how would this change?
22:30
How would it change however, it would change so we can calculate that.
22:34
I think Bobi is showing it later when she does the the properties, the protein properties, she will mention an option to also of the of the antibody.
22:50
So wiggle around the structure while calculating properties and incorporating flexibility.
22:55
And then this method would calculate an average, a weighted average of these properties also of things like patches that you will show in a second as well.
23:09
So surface properties like this, you could not only with that, she would say it, go over confirmation to incorporate flexibility, but also go over range of pH values.
23:20
You'll see that in a moment.
23:22
Yeah, I'm gonna work over that.
23:25
Maybe you can use the next time.
23:27
Yeah.
23:31
So for now, this was just a general idea of the whole surface of the protein.
23:36
Now we do know some of the specific regions from just by looking at the electrostatics and lipophilicity, but we can get more idea.
23:43
Let's say I want to look at the significant regions on the protein surface which are actually having an impact on the properties and the binding.
23:50
For that, I can use protein patches.
23:52
I'm going to hide the antigen and I'm just going to go ahead and create protein patches.
23:57
It will create the will pick out the more significant regions and the continuous regions of excess hydrophobicity or excess charge.
24:06
So the red here is for the excess negative charge, green for excess hydrophobicity and blue for the excess positive charge in this case.
24:14
Now if you see the green region that we were seeing on the surface, it's actually quite significant and it's a continuous hydrophobic region on the surface of the protein.
24:23
So now this assets my hypothesis of when we are saying this might be the reason why the antibody has poor solubility also for the red region that we were seeing with the electrostatics, it's also quite significant and might affect the developability of the antibody.
24:42
I am going to delete the protein surface so we can just see the patches more clearly.
24:48
And to get more information about the patches, I could also just load the patch analyzer now it gives me the detailed information for each of these patches and they're all sorted by the area.
24:58
So just by the area.
24:59
Also on the top, you can see the biggest patch on this protein surface is actually that hydrophobic patch.
25:05
If I just toggle on the sink, I could just select it here and I see this is the big hydrophobic patch.
25:12
Now we know that.
25:13
Yeah, sorry, just thinking about.
25:25
Would you also be able to panels of hints whether it's, you know, all of them are having similar kind of interactions and whether that's actually an important patch?
25:43
Yes, definitely.
25:44
So if you have like a bunch of structures, you could also calculate the protein patches on them and also do like an average of all the structures, like an average patch and also individually for each of them.
25:55
So you will start seeing if this is a significant patch in a particular confirmation that you have.
26:00
It's also from the pH dependent that I'm going to show later how we do the pH dependent calculations there also we calculate ensemble these confirmations for the protein.
26:10
It depends on how much you want to sample and still see how it changes.
26:16
The major difference that comes is because of the charge or the protonation of the residues.
26:21
So we do protonate the residues at different pH.
26:24
So you can see the differences of how at a lower pH, if it's interacting more or if it's at a higher pH, I'm not going to show it for the patches, but I'll show it for the properties.
26:35
And it's the same that goes over all.
26:40
Now in this case, let's get back.
26:41
So we know that this is the big biggest hydrophobic patch that we are focusing on and that might be the one causing the problem.
26:49
So if I want to pick some of the residues that I can possibly mutate and improve the developability, I would focus on this big hydrophobic patch.
26:58
So I'll just isolate it and I hit the rest of them.
27:01
I can also select these residues centre.
27:05
I can also isolate them.
27:07
And then at the end you will also see an option colour SE.
27:10
So I'm just gonna click on this and what this will do is colour the residues in the sequence editor.
27:16
So now if I just toggle this off, you can see some of the residues are coloured in green, the dark green the same as the colour of the patch.
27:23
And you will also notice that all of these residues are in fact present on the CDR's of the antibody, which we also saw in the 3D structure.
27:31
But just from the sequence editor also, we can get the information about the location of these residues.
27:37
Now I'm gonna select them once again.
27:40
And let's get to increase the transparency so we can see the residues behind.
27:47
And I'm also gonna go ahead and label these residues.
27:50
So now I can see all the residues which are involved in forming this big hydrophobic batch.
27:55
Now I have narrowed down my search or the identification for what I can mutate to the whole protein or the whole CDRS to just these specific residues forming the biggest batch on the protein.
28:10
This is just an idea so we can continue our analysis.
28:14
We could also look at the properties of this antibody.
28:17
So I'm gonna remove the selection and let me go ahead and calculate the properties.
28:23
So that is the protein properties parallel.
28:24
You will see two tabs on the top.
28:26
One is for the calculation and one is for viewing the property values when we have calculated them.
28:32
Right below it is the settings for the environment and the input.
28:36
So you can set the target pH like you were talking about if you want to calculate the protonation.
28:41
So you could also do it for a static structure.
28:43
But here you would also see this sample option.
28:46
So with sample, what we do is it will generate an ensemble of confirmations for a range of pH.
28:52
So if I toggle on I can see the range of pH.
28:55
By default it's just plus -1 but you can set the range as per you want and titrate over a pH range to see how the properties differ.
29:03
For each pH value there's a number of confirmations.
29:07
So by default it's 50 confirmations per unit of pH range.
29:11
You could increase that although 50.
29:12
We found it to be quite ideal and optimised.
29:15
It gives enough sampling within one unit of pH range to see the difference or the variance in the properties.
29:22
There's also some other options if you want to just tether these things like I just want to fix this particular part of the protein and I only want to sample the loop around the CDR's.
29:32
So you could also set those settings separately.
29:35
Now, just for this example, I'm not going to do the sampling and I will calculate the properties on a static structure, which will also give me some idea.
29:45
In the list you will you see all the protein properties we have by default.
29:48
We also have a lot of other properties in our custom packages.
29:52
So these are just the 112 of the available in the release version we have and more.
29:57
You can if I just Scroll down the properties to give a overall idea.
30:03
So we have some of the patch properties, which is more for all the proteins.
30:07
If I just Scroll down.
30:08
There are also some specific for antibodies as in the Cdr patches.
30:14
I just go more.
30:15
You also have the residue specific properties, so per residue charge or per residue hydrophobic or negative area contribution.
30:24
Then there is some molecular properties of the protein like mobility, volume, Pi predictions and surface area, which is also quite useful sometimes.
30:34
So I'm just going to calculate all the properties which are selected by default, there's 28 of them.
30:40
These are just the most commonly used ones which determine the biophysical properties of any protein.
30:46
So I just hit calculate and once it's done, it will shift to the viewer tab automatically and give me the values.
30:53
Now in this case you see on the top it's receptor residues, so it works only on the ones which are active.
30:58
So for now I only calculated it for the antibody.
31:03
You can see the values, for example, the area of hydrophobic patches quite large.
31:07
Just if I look around the CDRS is 520.
31:11
Now the this, it's quite large, we already know that, but I will also show you later how I say the quite large.
31:19
If you look at the distribution of the hydrophobic area over the publicly available antibodies, then we see what's the distribution of hydrophobic area that's found in the normally in the antibodies or naturally appearing ones as well.
31:33
So 520 here is quite large.
31:35
That's a red warning sign that we need to work on this to maybe improve the developability.
31:41
Then you have the mass Pi predictions and there's some of the properties with the +.
31:46
So these are the residue specific properties.
31:49
Let's say I want to look at the percentage exposure, so I get it for each of the residues in the antibody.
31:55
Easier way to look at it is I just toggle on the residue table.
31:59
So now it's only showing me the residue properties and in this case, I can look at multiple things.
32:07
I'm just gonna focus on a few analysis that we can get out from the protein properties.
32:11
Let's say apart from the developability issues, there might also be some liabilities in the structure, for example, diamidation or oxidation of the methionine, like in this case, let's sort this by residue exposure.
32:27
So that will just give us some idea of how many of them are exposed.
32:30
So the exposed ones are more reliable in general.
32:34
And I'll also sort the residues in alphabetical order just to find the one looking for.
32:41
So in this structure, like I already happened to know, there is a methionine that might be causing the problem, but you could just go over the list and see which might be a liable.
32:49
So let's say if I'm looking at methionine, so there are three methionine on this antibody and since I sorted by exposure, you can see one of them is completely exposed whereas the other two are completely buried.
33:01
So just with the exposure, I can say this might be a possible liable residue for the antibody.
33:08
Also if you look at the residue hydrophobic, you can see it is quite large.
33:11
So that is the contribution of this particular residue to the hydrophobic patch.
33:18
Deselect these two.
33:19
Let me just select this residue.
33:21
So when I selected it and go in the main window, you can see this is actually a part of that huge hydrophobic batch.
33:27
So that's what could be one of the residue positions that we could possibly mutate.
33:33
And if we remove the liability as well as we are able to reduce the hydrophobic area.
33:39
So that gives us just a more better antibody and improved solubility as well.
33:45
So I just selected this residue and it's also selected right here.
33:51
Let me just close that.
33:52
So that was just the protein properties.
33:55
That's our analysis until now.
33:57
We started looking by started by looking at the interactions at the interface.
34:02
Then we analyse the surface of the antibody and try to find using the electrostatic distribution as well as the lipophilic distribution of what might be causing poor solubility where we landed on one of the biggest hydrophobic patch.
34:17
And from the properties, we also found one of the residues that might be a possible liability for this antibody.
34:24
So we have the residue position.
34:27
Now the next question, the very natural question that comes up is, OK, we know where to mutate, but what do we mutate it to?
34:36
So to answer that question first, I will go ahead and bring this antigen up and see the environment around this particular residue.
34:44
Also on the antigen side so that we can maintain the compatibility with the antigen.
34:49
I will also create the surface for this antigen.
34:52
Let us say first I will just do it by atom colour.
34:55
So it is giving me the atom colour.
34:57
So we have the gold for the carbon and the other colouring for the heteroatoms as we have everywhere.
35:03
So blue for nitrogen, red for oxygen and you will also notice some sign patches on the surface.
35:11
So these indicate the presence of lone pairs.
35:14
There would be two sign patches next to each red patch.
35:16
So that is the lone pairs from the oxygen.
35:21
Now in this case it gives me some idea of what residues are present.
35:25
So here I can already see there is a nitrogen, there is also an oxygen and there is some lone pairs from the oxygen around the methionine.
35:35
I can also look at the lipophilicity.
35:37
So in this case I will go for the lipophilicity, same colouring so purple to green which is polar to hydrophobic.
35:45
And if I just look at the interface, you can see there is actually quite some hydrophobic, like a hydrophobic pocket on the antigen surface where this antibody is binding.
35:56
So that's the hydrophobic pocket on the bottom side.
35:59
It looks like the antibody is sitting quite well inside the hydrophobic region on the antigen side.
36:08
But if you look on the top side where you have this methionine, it is actually quite pink or more towards the white region.
36:14
That means it is polar surface on the antibody around the methionine.
36:18
So now this gives me the answer of what do I mutate these residues to?
36:24
So ideally you would mutate it to some polar residues might also improve the affinity with the antigen.
36:29
I am going to remove this surface, not just this one for the antigen.
36:37
And I have the methionine selected.
36:39
So I'll just go ahead and hit site view just to see the environment in more detail around the methionine residue that we are going to mutate.
36:47
And if you look around, you can also label these residues.
36:52
So any labelling only works on the selection.
36:57
Now since I had it selected, so it only labelled the methionine for me.
37:01
But around I can see there is a arginine residue on the antigen side.
37:05
So they're all polar.
37:07
Now if I mutate this methionine to some of the polar residues might also end up forming some novel hydrogen bonds or some new interactions and may be improved affinity.
37:20
So we are done with the analysis.
37:22
Now I'm gonna move to the second part, which is the virtual mutagenesis.
37:26
There are two ways that we can do that.
37:28
The first is just manual mutagenesis where I just picks 1 residue and I can try different positions.
37:34
So and the second way is more of an automated way using protein design where you could pick multiple residues at the same time and also mutate to a set of residues.
37:46
I will start with the manual mutagenesis.
37:48
For that I will just load in the protein builder.
37:53
Now before I do that, I'll also show you another application we have.
37:57
So if I want to come back to this, I could also just save this session right here or we have this new application in our 2024 release which is captured.
38:07
So it lets us save the scenes or the screenshots as you want to call it, for each of the modifications that you are doing.
38:14
So you can just track all the changes that you are doing in the main more window and also go back to the original structure.
38:21
So at this point, I'm just gonna save this original crystal structure.
38:25
I'll just add a capture right here.
38:28
Let me name it just three, G60 and Oklahoma.
38:32
Yeah, I can overwrite that so I can give it a title.
38:34
So that's my original structure and and it's a myth.
38:37
You need 92 that I'm focusing on.
38:39
So I just name it 92.
38:41
You could also add some extra comments.
38:43
So with this capture, it will actually create a database that you could share among your colleagues.
38:48
So if you have done some design changes or the modifications and workflow, you could also add comments for your colleagues.
38:54
Let's say, OK, yeah, this is my original crystal structure, but just add it as a comment.
39:02
I could also save other things that you have made and more.
39:05
For example, surfaces.
39:06
If I created any surface, that would also be saved in the scenes.
39:10
And I can also save the theme in the main window.
39:12
So that's just for the aesthetics.
39:14
So I'm gonna go ahead and save the theme.
39:17
OK, so it has added the capture and you get the list so I can add it to my footer.
39:22
So just for easy convenience of browsing through different states.
39:27
Now this is my original crystal structure, the protein builder on the right.
39:30
So on the protein builder you have these three tabs prepend, mutate and append.
39:35
Since I've selected the residue by default, it brings me to the mutate.
39:39
Then you have the list of naturally occurring amino acids.
39:43
Then we do have some of the non natural amino acids by default, but you could add any of the custom amino acid that you would like in the library and use it from all the applications across.
39:55
More in the middle.
39:57
Here are some options for repacking and minimization just to optimise the fit of the mutated residue that we are putting in at that position.
40:05
And at the bottom is for exploring different rotamers of the side chain confirmations.
40:11
In this case it's methionine.
40:13
So now from our analysis we know it is a good idea to mutate this methionine to some polar residue.
40:20
Now anyone want to suggest what do I mutate it to?
40:22
It's just for trying.
40:23
Does anyone have any suggestions?
40:25
What could we mutate it to?
40:27
We know on the antigen side it's quite polar.
40:30
Lysine, Sorry, lysine, Lysine.
40:33
OK, I'm gonna go ahead with lysine.
40:35
Let's just put it there.
40:37
Lysine is quite big as compared to the methionine.
40:41
But let's go ahead and repack and once it's done repacking, I will also go ahead and minimise it.
40:49
So this is just for trying out.
40:51
Now in this case with lysine, when I have minimised this, I don't see any more interactions being formed.
40:57
Also, it's much bigger in size as compared to the methionine, so might want to go for a residue which is a bit smaller.
41:06
So let's say, does anyone have any other suggestions now?
41:13
Should I just go and try what I think might be the best?
41:18
OK.
41:19
OK.
41:19
I'll go with the glutamate and I'll again just repack it.
41:25
And then you also have the minimise, so you can minimise with different parameters.
41:29
So just the side chains are also the environment around and also the rigid bodies.
41:33
That's more for when you are mutating with the whole protein.
41:38
In this case, I will just go with the selected side chains and with glutamate.
41:41
Yeah, you can see there is a hydrogen bond being formed when I minimises the side chain.
41:46
So that might be a good mutation to go for.
41:48
Also glutamine might be a good option since it's almost the same size now I just mutated it to glutamate.
41:56
I will just add this as well.
42:00
So that's my first mutation and I can also go back to my original structure.
42:06
So that's the methionine and in this case glutamate.
42:10
Did I not save the?
42:12
I forgot.
42:13
So that's just one of the mutations.
42:15
I could also try more, let's say also aspartate, if that can fit.
42:19
I can repack this.
42:22
And let's say if I do aspartate, even with aspartate I can see that there's a hydrogen bond with the.
42:32
So with the asparagine on the other side, if I click on anyone atom, I can see at the bottom which residue and which atom I have selected.
42:40
So in this case with aspartate and also with glutamate, we see a hydrogen bond being formed with the asparagine on the antigen.
42:49
I could also save this just in case.
42:57
So now this is just manual mutagenesis where I tried some of the options of what we can put at this position and we did find some two of the possible residues that we could use to form some novel interactions with the antigen.
43:13
Now I just got back to the original crystal structure.
43:19
I'm gonna make it white again.
43:21
That was just the first method for manual mutagenesis.
43:24
Now I'll show you the next one, which is a more automated way.
43:27
And you could pick multiple positions.
43:30
In this case, for this antibody, you could just select all the positions that were forming the hydrophobic patch.
43:36
But we saw from our analysis, we'll narrow down our search.
43:39
We don't want to pick the residues which are present at the bottom and fitting in that hydrophobic pocket.
43:44
But we would instead just use our analysis and pick the residues which were not compatible with the antigen.
43:51
So for the automated design, I'm just gonna select one more, which is the valine right next to it, which is also present in the same environment.
44:00
And we know on the antigen side it's quite polar, whereas this is a non polar residue.
44:05
So just selected those.
44:07
I'm gonna close the protein design, protein builder and let's go to the design.
44:12
So under protein design, we have a couple of methodologies available under the same umbrella.
44:18
For example, you have the alanine scan, disulfide resistance and then we we have the residue scan.
44:23
So I'll just open the residue scan.
44:25
That's what we'll be using today.
44:28
We just define the system on the top.
44:30
Then we have our mutation expression list.
44:32
And at the bottom are some more options for ensemble generation, affinity calculations and also some of the properties.
44:42
I'll go through each of these.
44:44
So let's say you have the alanine scan where all the residues that you selected will be mutated to alanine.
44:51
It's useful, let's say, if you wanna look at how relevant a particular residue position is, how it's contributing to the affinity or also to the properties.
45:01
When you have the disulfide scan where it will pick a pair of residues which could be mutated to cysteine and form a disulfide bridge at the same time.
45:10
So it's used majorly.
45:12
Let's say you want to increase the improve the stability of the protein.
45:17
There's the resistance scan, which is more based on the single nucleotide polymer, so just the snip mutations which can naturally occur in a protein.
45:25
Then the residue scan, which is completely customizable.
45:28
So I can customise my mutation list or the mutation expression for each of the positions, any of the sample sequence.
45:37
So let's say you have a really large sequence base.
45:40
In that case sample sequence can be useful because it optimises the search and works with also other applications, let's say mutation analysis, just to pick the best sequences.
45:51
So for example you can see in the settings it's set to 40 now.
45:54
So depending on how many you want to sample just the top most optimal ones.
45:59
Then there is a sequence designed for multi point mutations, but you could also just use the site limit.
46:04
I will show you how.
46:05
Now in this case I selected 2 residues so I have the methionine and the valine.
46:10
By default it will mutate to all the 20 natural amino acids.
46:14
But we did some analysis so we will save our time here and use the knowledge that we have.
46:19
Instead of all the 20 natural amino acids, I will remove the hydrophobic ones from the list.
46:26
So you have the categories for each of the, I mean, sorry, I will open it again for each of the amino acid.
46:32
I really like this option because I could just, I don't have to go ahead and think, OK, which are the polar ones I could just choose from right here and I click on apply for the second position again.
46:44
It's the same thing from our analysis.
46:46
We would like to mutate it to some of the polar residues.
46:49
In this case, I'm just going to set it to one just in interest of time, just to show you as an example.
46:57
So I click apply.
46:57
So valine to threonine, it's the same size, but it's a polar residue, so it's going to fit nicely at that position.
47:05
Then the next is the site limit.
47:07
So site limit 1 refers to single point mutations.
47:11
So it's just going to perform one single point mutations for the whole, which gives us a mutation space of 13.
47:19
But I could also increase it.
47:20
In this case, I have only two positions, so I am going to go with two.
47:23
So I also want to perform the double point mutations.
47:26
Now you can see the mutation space increase to 22.
47:31
Let's leave it at 2:00.
47:32
Then next you also have the ensemble.
47:35
So you could also calculate for each of the mutated structures this ensemble of confirmations.
47:41
In this case, I am going to skip that in interest of time again.
47:44
But we can also calculate the affinity.
47:46
So that can be quite useful.
47:48
In this case, I already have the antigen molecule in my system.
47:51
So for each of the mutant, we will calculate the changes in the affinity of with the antigen.
47:57
Then we also calculate some of the protein properties and justice to make sure it's going to calculate the properties for the antibody, which is also the default setting.
48:20
With protein design, I don't think so.
48:25
It's just going to perform the mutations and actually just, yeah, Barbara, I want to say something.
48:32
Maybe the resistance scan is something here where it does the SNPs, maybe that's the one you're looking for here.
48:40
Yeah, for the naturally occurring ones, it's more of the resistance scan.
48:43
This will just look at the mutation list that you have mentioned and definitely it's gonna minimise the environment around the residue position so it will try to find the optimal mutations.
48:57
Now in this case, I've started the run.
48:59
Let me just cancel this.
49:02
And we do have a pre calculated database, so not to take much of your time.
49:07
I'm just gonna open the pre calculated 1.
49:14
Sorry, it's here.
49:21
Set it as current working directory and in the pre calculated I just have this protein design.
49:28
So it's the same this database was calculated with the same settings as I just showed you and there are 22 of the mutations yes.
49:37
Just out of interest, how long would that take to calculate?
49:41
That would take like maybe 10 minutes, less than 10 minutes.
49:45
Yeah.
49:46
Yeah, just for this one.
49:47
It's still quite quick, but I just want to go ahead with the results now.
49:54
So that's the output of the protein design.
49:57
And the first entry will always be the wild type, just to set a reference.
50:02
So we take the end residue that you're mutating and the environment around it and just minimise it to set a reference level for the affinity, stability and all the property calculations in addition.
50:15
Sorry, sorry, one interruption 1 addition to your question, how long that would take.
50:19
Protein design can is one of the applications and more that can take advantage of GPU's.
50:24
So you can make that pretty quick even if you do the sampling of the site change where you wiggle around.
50:31
So protein design and protein, protein docking can take advantage of the GPU's in almost all the panels.
50:40
You will also see this batch button at the bottom where you could just generate like a run file that you can run on the command line.
50:47
So you can also run it in parallel using multiple codes.
50:50
And as Barbara mentioned, for protein design, you could definitely use the GPUs just by creating this bash file.
51:00
Now I'll get back to my protein design output.
51:03
So as I was saying, the first entry is always the wild type, just to set a reference.
51:08
So you'll see the values.
51:09
The delta affinity is set to 0.
51:11
Now in this case we do give the absolute values, but the more relevant to look at is the delta affinity.
51:17
So the D here refers to delta and also gives the colouring.
51:21
So going from blue to red, the blue means it is the good mutations where it is a negative of the delta affinity.
51:27
So that means the affinity is improving since it is an energy function.
51:32
So the more negative the better it is.
51:34
Then you have the stability and all the protein properties also in reference to the wild type.
51:40
Now in this case, I will just go ahead and look at the Prophet Cdr hydrophobic.
51:49
So since we are focused on improving the solubility of this antibody, if I look at the patch Cdr hydrophobic, you can see all of the entries here are negative.
51:58
Well, that makes sense because we only mutated it to the polar residues.
52:02
So we did want to break that big hydrophobic patch and that is what we are getting at the end.
52:07
I could sort by any of these fields if I just want to look at which one is the best in terms of hydrophobic or also affinity.
52:14
So in this case I will just go ahead and sort by affinity.
52:19
So I'm just gonna OK, yeah, I have to save this database.
52:23
Now this is a read only database because we have been the pre calculated.
52:26
So I cannot make changes to it.
52:28
That's because in more database if you make any change, you cannot revert back.
52:32
Whereas in more window you could just press control Z and go back to if you have made any mistake that doesn't work in the database.
52:39
So we make it read only databases.
52:42
I could just browse from right here.
52:44
So I'm just gonna go ahead and browse the results.
52:47
Let's overlay the wild type and I am going to hide the original 1.
52:53
So you have the mutant in green and the wild type in green.
52:57
But the first entry is the wild type, so it looks the same.
52:59
I will just go ahead.
53:01
So that's the first mutation, just value into theonine.
53:04
And it also gives me the delta stability and affinity values in the browse panels.
53:08
I can keep a track of my mutations and also look in the main window.
53:12
In this case, the affinity remains more or less the same.
53:15
The stability decreases a little bit, although with these values, you have to be careful with these values.
53:21
So plus -1 I would consider it to be pretty much the same because we are calculating for the whole protein and justice mutating at a smaller position.
53:30
So the values, if we let's say, if you see plus -2 then I would still say, OK, yeah, there is some difference.
53:37
In this case I can say stability remains more or less the same.
53:41
I just go ahead for this one.
53:42
The delta affinity is -4 so that seems like a good mutation.
53:46
And this is methionine to arginine.
53:49
So someone suggested lysine was not working, but arginine seems to be improving the affinity and also the stability remains more or less the same.
53:59
Now if I just go to the main window, I can also see how this arginine is placed at this position and if it is forming any interactions with my antigen, just move ahead.
54:10
So a double point mutation of methionine to arginine and valine to theonine also gives us some improvement in the affinity and the stability remains more or less the same.
54:21
So that's just, this was a very small mutation space, just 22 mutations.
54:27
So I can I could just pick manually, but if you have a large mutation space, you could just sort by the delta affinity and pick the ones which have a significant change.
54:40
You can go ahead and browse a couple more.
54:42
Although we did find the mutations which were some good ones and bring that back up.
54:47
So that was the second way of how we can perform the virtual mutagenesis using the protein design application.
54:53
And you can have multiple residue positions at the same time with the customised mutation list.
54:59
Is there any questions about this before I move on?
55:03
Yes.
55:20
A bit.
55:21
There's this repacking and the minimization part that also takes a little bit of the backbone into account.
55:31
Yes, neighbouring in terms of neighbouring, but also in terms of the 3D surroundings, not necessarily in the sequence neighbouring, but in terms of the surroundings that would be depending on.
55:45
Yeah, the minimization, taking that into account.
55:48
Yeah, that was another question.
55:50
Yeah, same question.
55:56
Great.
55:56
Yeah.
55:56
What do you mean?
56:07
Oh, here the sample sequence.
56:09
OK, Sample sequence, that's when you have a really large mutation space.
56:13
So you either have or both.
56:15
You either have many mutation sites and many mutants that you want to try out, and what sample sequence does it?
56:22
It randomly selects a subset.
56:26
You have to specify the subset based on some.
56:30
I can't give you a rule of thumb now based on the mutation space, how many, how big your subset should be.
56:36
You can't do that right now, but you know, you can do that.
56:39
And then it would do that randomly.
56:41
And then with the application pool, we mentioned mutation analysis.
56:44
You can do some statistics, you know, and then it would tell you, OK, this side is more promiscuous.
56:49
It doesn't really matter what you put there.
56:51
On this side, you should really put something hydrophobic on this side.
56:54
You should put only that amino acid.
56:57
That's what some particles would do, right?
57:01
While I'm acid, I think.
57:04
What?
57:04
Yeah.
57:05
Just one more thing to your question.
57:06
So I will just show you the repacking of the environment.
57:09
So the cutoff is by default is set to 4.5 angstroms around the residue position that you're mutating.
57:15
You could also adjust that, but in general, that's quite optimal.
57:19
So just to repack the environment around 4.5 angstroms.
57:23
Good point.
57:23
And also the sample, the ensemble option here, if you turn that on, then you will have more flexibility as well.
57:31
Yeah, maybe, possibly more robust results will definitely take longer.
57:38
Yeah, it just increases the compilation time a little bit, but no more.
57:42
MD is actually quite fast as compared to molecular dynamic simulations for sampling the confirmations.
57:51
I think you're done with this section, probably, right?
57:52
Yes, I'm done.
57:53
OK, while you're at it and changing to the next section, I'm going to give probably a little bit of a break and I'm going to give you the opportunity to leave with dignity.
58:02
So the next section is going to be about antibody modelling.
58:07
Oh, you've got a water break.
58:08
Fantastic.
58:09
So it's going to be antibody modelling, having sequences of an antibody, modelling it and doing some profiling on it and then doing multiple antibody modelling.
58:18
So if that's not something that you're interested in, it's OK if you leave, that's that's fine, perfectly fine.
58:25
OK, go ahead.
59:03
Yeah, I'm sorry, I can't.
59:22
I can't hear.
59:23
You're looking at a different complex and it's not an antigen, but it's a different receptor ligand for protein A, protein B interacting.
59:35
How many of the tools would be applicable?
59:38
Oh, all of them, Literally all of them.
59:40
Just we have antibody specific tools as well.
59:43
Like for homology modelling, you have a general homology modeller as well as a specific antibody modeller.
59:49
And for the properties you saw there is like some of the specific properties for antibodies, but the rest of them are all general for all the proteins could use the contacts surfaces, works as general for any protein patches as well and properties as I showed.
1:00:09
I just realised he has food.
1:00:10
If anybody needs food, it's way above, way past lunchtime.
1:00:15
So there's a bit of food left here.
1:00:17
Not the lunchtime, I think after, right.
1:00:26
Yeah, right.
1:00:27
Yeah.
1:01:09
Yes.
1:01:45
I.
1:02:21
Correct.
1:02:51
Looks.
1:03:57
Settle then might be a lot to take in.
1:04:01
So I'll leave up to you guys whenever you want me to start.
1:04:11
Relax.
1:04:14
OK?
1:04:15
Oh, I will just show this one thing for your question and also the one.
1:04:19
OK, Sheila.
1:04:20
Yeah.
1:04:20
So we did some studies with the ensemble protein properties that we have and specific for antibodies.
1:04:26
We correlated with these experiments, for example, with the HIG and the SIG experiments.
1:04:32
And we found this patch Cdr hydrophobic to be very well correlated when we do this ensemble.
1:04:37
It's correlated with the static structure, but of course it's more robust when you have done the sampling.
1:04:43
And we also did some studies for pH dependent behaviour as you were talking about.
1:04:47
And you can see for this particular antibody set up the map for pH 6.5 and 7.4, you can see the difference in patches at 6.5.
1:04:57
Suddenly this patch appears and there's a hydrogen bond or salt bridge being formed within the aspartate and the glutamate.
1:05:04
So you can see the variations over the different pH ranges using the ensemble calculations.
1:05:12
Sorry, most of the questions I have, let's say I have 100 minutes that I want to try this on and you know, scan changes, yeah, scan mutations.
1:05:36
How easy would it be to do that?
1:05:39
I mean, doing is quite easy.
1:05:42
I mean as I sold for the one structure, it's the same for multiple structures.
1:05:46
You just run instead of on the structure.
1:05:49
The more window you run it on a database.
1:05:51
So that's not so much different, but it's about the time.
1:05:54
So it's definitely going to take larger.
1:05:57
But as also we mentioned for these design and the mutations and the sampling, you do have the GPU support.
1:06:04
Yeah.
1:06:05
So it be quicker depending on the system that we are using.
1:06:12
OK.
1:06:13
So I'll just give a quick overview of what we did in the first section.
1:06:17
Starting from a crystal structure of an antibody antigen complex, we analysed it by looking at the interactions at the interface, the surface properties of the antibody and figured out the reasons for the poor solubility by identifying this hydrophobic patch on the surface and also which is present at the interface.
1:06:35
And we finally use that information from also from the protein properties found one possible liable residue which could be oxidised and mutated that using two methods.
1:06:44
So virtual mutagenesis just manually and also using the protein design more automated fashion.
1:06:51
Now we are moving completely away from it.
1:06:53
So we are starting where we do not have the structure information present.
1:06:58
We just start with the sequence.
1:06:59
I'm just gonna switch to more.
1:07:03
Let me just clear the window.
1:07:04
So I'll close this, close everything.
1:07:09
I will also close this capture database and Justice, go ahead and load it.
1:07:15
So we have this one of the sample sequences, which is an FV of a mouse.
1:07:19
So I'm just gonna load it in.
1:07:21
And by default, Munoz, this is just a sequence file, so it will pop open the sequence editor.
1:07:26
We have two chains here because it's just the FV region.
1:07:31
You'll notice the difference.
1:07:32
Now you'll see the residues are like light coloured, not really bold.
1:07:36
This is an identification in more.
1:07:38
If you do have the atomic coordinates for the residues present or not, like for example when we build the structure, you'll see the difference.
1:07:46
I already know this is an antibody, so I will take advantage of the annotations and before I start the modelling, I'll just go ahead and annotate it and instantly it will still separate.
1:07:57
So it will look for the light chain and the heavy chain.
1:07:59
The first subunit is light chain.
1:08:01
Then we have the heavy chain and the CDR's according to the IMGT numbering scheme.
1:08:06
Now for building the homology model, if I just go to protein here, you'll see there's a homology modeller, which is more common for proteins, but we do have something specific for antibodies, which is the antibody modeller.
1:08:18
So I'll just open it and you'll see it's loading a project database.
1:08:23
So we have a project database of all the publicly available antibody structures and we use only that database in the antibody modeller.
1:08:31
You don't need to have a template.
1:08:33
It will directly pick the template from the antibody database.
1:08:37
And also by default as soon as it's loaded, you can see it automatically detected.
1:08:41
It's the VL and VH chains, and both the chains are in here.
1:08:46
So on the top we just specify our input.
1:08:48
This is a sequence.
1:08:50
In the middle section.
1:08:51
You also have other options.
1:08:53
So I will just show, for example, it's just the VL, VH.
1:08:55
In this case.
1:08:56
You can also have just a VHS domain or a VLL and even a bispecific.
1:09:01
So if I go to bispecific, then it will give me automatically 4 chains to be for the model generation.
1:09:08
I'll just go back to VL VH, The second section you can specify the model type.
1:09:12
Let's say you just want to build the variable domain or the whole IgG as well.
1:09:17
In this case, I will just go with the input sequence.
1:09:19
I just want to see the Fe region at the bottom you have some settings for the models of how many models I want to generate and how many of the Cdr models, which is very which can be very important especially when we talk about antibodies because CDR's are the more most variable and the flexible regions.
1:09:38
So you might want to explore more confirmations and build more models.
1:09:43
Now I will just go to the slide once to show how antibody modular works and more.
1:09:48
So we take the query sequence and it works in 2 steps.
1:09:52
First, it will look for the framework templates from the antibody database and just copy the backbone as it and we have the framework.
1:09:59
Then it will look for the Cdr templates and start grafting the CDRS on the top of the framework.
1:10:04
Finally, when you have the whole model, we also place the side chains and then perform the minimization and the refinement.
1:10:11
So we have an optimal system at the end and that's also reflected in the panel.
1:10:16
So the first tab on the top is just for the model settings, just overall settings.
1:10:22
Then you have the framework, so you can see all the frameworks being identified from the antibody database.
1:10:29
Based on the sequence, it gives a score.
1:10:32
Yeah.
1:10:33
So here you have the score based on the sequence identity against the input sequence and it will automatically pick the template with the highest sequence identity.
1:10:42
Also get some more information, for example related to the B factors.
1:10:46
Sometimes that can also be useful to look at.
1:10:49
In this case more already selected.
1:10:51
You could ideally change the framework that you would want to choose.
1:10:54
I would just go with what more suggested.
1:10:57
Now after framework we come onto the CDRS.
1:11:00
So now for CDRS, you'll pick different templates based on the similarity and the identity with the input sequence for each of the CDRS.
1:11:10
So for L1L2L3 and for H until H3.
1:11:14
Now I can use multiple templates for CDRS since we know they can be quite variable.
1:11:18
It's just to explore more confirmations.
1:11:22
Let's say Cdr H3 which is the most variable loop.
1:11:25
Again I get this score which is based on similarity and then you also have based on identity.
1:11:31
The + next to it is the one selected but I have I can see there is quite some of them so I can also mark this if I want to build more than one Cdr model.
1:11:42
Now when I do select more than one template for the CDRS, one thing that we need to ensure is also going and changing this to two.
1:11:51
Because if you set it to 1, then it will still pick the best Cdr based on the energy out of the two that you have selected.
1:11:59
So if you want to build multiple models, just change the number of models as well.
1:12:03
In this case, I'm gonna remove it and let's just go with what MO has suggested.
1:12:09
So I'll just go with 1 and I can just go ahead and build it because it shows me the progress and tells me at each step.
1:12:17
OK, Now it's building the homology model for the first one, and then it's when it's packing the side chains.
1:12:27
Is there any questions about the Ulogy modeller?
1:12:30
Is the same procedure for.
1:12:31
Yeah.
1:12:32
Sorry.
1:12:35
I'm sorry, I can't.
1:12:39
I can't hear you.
1:12:40
Yeah.
1:12:43
Thank you.
1:12:43
Barbara, Does it work?
1:12:46
Yeah.
1:12:47
You're showing us this homology modelling approach to generate antibody structure.
1:12:52
I have a feeling that the field in the case of antibody modelling is gradually moving into AI models to generate the structures and I wonder how What's the performance of the method that you are showing us compares to the state-of-the-art approaches based on AI?
1:13:14
Well, I've personally not tried the ML approach, but from what I'm aware of, our antibody modeller works quite well with the AI systems as well because it's anyways using the antibody template and also for the AIML structures.
1:13:29
It also uses the publicly available data.
1:13:32
But the only difference is the more you keep feeding into the AI models, the more it gets.
1:13:36
Whereas this comes from just the antibody database.
1:13:40
So it's still I would say quite close at the moment.
1:13:45
But as I said, I've personally not tried the applications.
1:13:51
So what we did to what's the word when you try to prove something works is we took the antibody database at some point and tried to model sequences, but we didn't.
1:14:09
Of course, we took this, the structures out of the antibody database so that they would not be available as templates for the templates for the sequences we wanted to model.
1:14:18
And I don't have the numbers in my head what the RMS DS were to recreate available structures, available experimental structures, but they are within the range of the flexibility of the structures.
1:14:37
A framework is not a problem at all anyway.
1:14:40
But also within the Cdr loops.
1:14:41
And also the CDRH 3 loop is within naughty point, something angstrom RMSD or maybe one, maybe one.
1:14:50
As I said, close to 1.
1:14:52
Maybe I don't have the numbers in my head, but it's very close to the actual experimental structure and still within the flexibility of what it is.
1:15:07
That's our validation.
1:15:09
Validation.
1:15:10
Yeah.
1:15:11
So that's what we did to do that.
1:15:13
And yeah, I agree with you, the AI methods are more and more, but they also need to be fed with something.
1:15:20
And yeah, structures based approach still works, I think, and also with the CDRS, like when you're working.
1:15:28
So why do we build the model?
1:15:30
Like you want to do further analysis on it also calculate the properties.
1:15:34
So just what normally I have seen people doing is with CDRS, they just generate multiple CDRS models.
1:15:40
So take a more of a conformational space and use a bunch of structures also for docking or for property calculations like calculate the properties like an average one that you have.
1:15:50
So that still gives an idea if you use an ensemble of the structures to begin with as a starting point.
1:15:57
So again, you get close to the AI models.
1:16:01
If it's giving one structure, you can use like 4 structures instead of that.
1:16:04
OK, OK, so I'm just gonna close this model.
1:16:10
I have the model already built in.
1:16:12
Here is the output from the **** antibody modeller.
1:16:16
The first field just contains my model and then I have some information about the templates and the query sequence.
1:16:21
In this case, I only care about the model, so I'm going to load it in more.
1:16:26
Just let me clear the system.
1:16:27
Now you see the amino acids and the residue, those are more bold.
1:16:31
So that means we have the atomic coordinates for each of these residues.
1:16:36
And it's also already annotated with the antibody model as since it's very specific, you have the Cdr and the light chain and the heavy chain.
1:16:45
So now I have the model.
1:16:46
I can do a few more.
1:16:47
I will just go over a few more tools.
1:16:49
In this case, it's a mouse FV that I started with.
1:16:53
Let's say I want to use it more for the human site.
1:16:57
So I'm not saying I'm humanising it, but that's just one of the aspect because that's more of a broader range.
1:17:04
So just one of the things that we can look at is let's say I want to identify the potential glycosylation sites on this antibody.
1:17:11
So I have built the structure to identify the glycosylation sites.
1:17:15
I'll just go to the selector.
1:17:17
So you can use selection language in more.
1:17:20
It's quite diverse, to be honest.
1:17:22
Like I also don't know the full extent of it, but there are some of the things that we can do very easily.
1:17:28
For example, this lycosalation site, we have that as an example by default in here.
1:17:33
So I can just use this expression.
1:17:35
Now what this expression indicates is it's a pro site, which which means site on a protein and the sequence pattern as it's shown here.
1:17:43
So the first residue is asparagine.
1:17:45
Then you have in curly braces that means anything except proline.
1:17:50
In square brackets, it's either serine or three, one and finally at the end, anything except proline.
1:17:56
Now the last part is quite important, which is and exposed.
1:18:00
That's why we built the model here.
1:18:01
I could just search for a sequence on this particular sequence pattern on the sequence, but that just might be a buried site.
1:18:09
So that's not gonna be glycosylated.
1:18:11
So the and exposed here is quite important.
1:18:14
And if I just press enter so it will identify one of the glycosylation sites.
1:18:20
So in this case, you can see the sequence pattern and it's present on the CDR's of the light chain.
1:18:26
So we have identified the site.
1:18:28
Now if I wanna get rid of this site, so remove the potential glycosylation site, how can I do that?
1:18:34
So I have this site selected.
1:18:36
I'm just gonna render the items a little bit.
1:18:40
So I'll colour them, sign separately.
1:18:43
Let's show these items in the main window and also show the residues and also go ahead and label them.
1:18:52
So all the selected will be labelled.
1:18:54
I sent it On these, I see all the four residues forming the glycosylation side.
1:18:59
So in this case, if I just rotated around like this, I can already see asparagine and lysine are exposed.
1:19:06
So if any antigen has to bind on the top, so these are the residues which will be exposed towards the antigen surface and possibly form the interactions.
1:19:15
Whereas the other two are more on the side.
1:19:17
So if I have to remove the site, you can see asparagine, lysine, they can form quite good interactions.
1:19:23
So I would probably escape them and maybe look at the other two.
1:19:29
Just in the main window already gives me an idea.
1:19:31
But I could also just look at the ligand interaction diagram.
1:19:34
It's more for the interactions of ligand with the protein molecule.
1:19:37
But in this case also it just give me some idea.
1:19:40
In this case serine is less exposed than the asparagine and the lysine on the top, whereas the histidine we have here is quite buried.
1:19:48
It's very little exposure.
1:19:50
The blue Halos around the atoms indicates the solvent exposure.
1:19:54
So I could also look at in 2 dimensions just to be more clear.
1:19:59
I already know in this case these two are exposed.
1:20:02
The histidine is quite buried, whereas the serine looks like something possibly we could mutate to remove this glycosylation site.
1:20:11
So I'm going to select serine here since I selected that residue.
1:20:17
Now the natural next question again, what do we mutate it to?
1:20:20
So in this case.
1:20:23
I'm going to use the antibody database to look for the residues which are normally or naturally present here in the publicly available antibodies to see which residue at this position could be good.
1:20:36
So I'm going to open the antibody database.
1:20:38
Sorry, it's here.
1:20:41
Now we have this family databases and what we call project databases.
1:20:45
The good thing about them is all the structures in this database, they have all the publicly available structures.
1:20:51
They are aligned, superposed as well as annotated on the basis of a reference structure.
1:20:56
So it makes it easy for comparison.
1:20:58
And also for these kind of applications, let's say I'm looking for a residue which is commonly present at that position.
1:21:05
So I can see the distribution very easily because they're already superposed on each other.
1:21:12
I'll just give an idea of how this looks like.
1:21:14
Let me hide this, the project database and I can browse through them.
1:21:21
So that's the first structure.
1:21:22
So you can see all are in the same reference frame and they're also annotated.
1:21:27
And these project databases, you could also create your own with your own in house structures, new family database, or you can also upload the existing project database.
1:21:37
These are all stored locally.
1:21:38
So you can use just your in house structures as well.
1:21:42
I'm just gonna close this now, what we are going to use it for is to figure out what we can put at this serene position.
1:21:49
So I selected the serene again and at the bottom you will see a button called starts.
1:21:55
I just open this now.
1:21:56
It will give me the statistics.
1:21:58
Also when I mentioned, if you remember in the first section of how do I know that this is a large hydrophobic area for an antibody?
1:22:05
So I know that from the statistic distribution, this is just by species.
1:22:10
If I go to the path Cdr hydrophobic, you will see how the area of the Cdr hydrophobic region around the Cdr is for most of the structures is in between 200 to 300.
1:22:22
Then you have until 400 and above 400 is very less of the structure.
1:22:26
So that's coming from the statistics that it's a quite large hydrophobic area.
1:22:33
Multiple parameters you can look at for example the Cdr length for the H3.
1:22:38
The most commonly found length is like here 11/12/13.
1:22:42
That's just the number of residues in the CDR's.
1:22:46
But also look at the others.
1:22:47
In this case what I'm interested in is the residues at that position, so I just go to UID.
1:22:54
By default it gives me the residue distribution at the first position, but I have the residue selected.
1:23:00
So I can use this magic button where I get to the selected residue position and at this position in the publicly available antibodies which is the residue which is most commonly found.
1:23:12
Might sound very surprising to mutate some residue to proline, but in this case it looks quite reasonable and logical if I put proline at that position because it's commonly present and it gets rid of the glycosylation site.
1:23:28
So I use the statistics here to find out my residue to mutate with.
1:23:34
I'm just gonna close this for now.
1:23:35
I'll also close the database.
1:23:37
And now it's the same procedure for mutations as we did for a manual mutagenesis.
1:23:43
Instead of sitting, I could just go ahead and put proline here.
1:23:47
Just place the Proline.
1:23:49
I will also repack it and for minimization this time I will go with the selected residues because it's Proline.
1:23:55
So Proline is integrated into the backbone.
1:23:57
So I will just go ahead and minimise the whole residue.
1:24:02
So once it's done, now the antibody that I have starting from a mouse FV sequence, now I have is more closer to the human antibodies even it's closer.
1:24:12
So we have also removed the glycosylation side.
1:24:17
This is just one of the analysis that you can do on a single structure.
1:24:22
So this was just homology modelling starting from one sequence.
1:24:26
Is there any questions before I move on to the next one?
1:24:33
No, OK, so now we'll move on to the homology modelling where you have a bunch of sequences and you would like to just pick, let's say you have a 506 hundred or maybe 10,000 sequences and you want to pick some of the good sequences.
1:24:50
So again, how the workflow goes is you could do a batch homology modelling again using the antibody modeller.
1:24:55
I will show now.
1:24:56
So once we have the that's already comes down to third step as here in the workflow you have the 3D structures, then we can calculate protein properties on them or you could also calculate the patches over a bunch of structures.
1:25:09
That really depends on what kind of analysis you are doing and then sort the sequences on the basis of good properties and the developable sequences possibly when we are talking about antibodies, just going to show it to you in MO as well.
1:25:24
Let me just close all of this.
1:25:27
Close this as well, and I can also load the antibody modeller from here.
1:25:32
It's loading the project database.
1:25:34
The only difference is in the input.
1:25:36
Instead of chains, I'll use files now, and we do have a sample file with 14 sequences, just the FT sequences, so I can select that.
1:25:46
In this case also, it automatically detected its VL VH.
1:25:50
Just one thing to make note of, if you have VL VH, it should all be in the same order.
1:25:55
You could have VHVL in your faster file, but just all of them should be uniform so that monos, if it's VHVL, it could just be VHVL all the way across.
1:26:08
Again, for the model time, I'm going to go with the defaults and justice.
1:26:11
Go ahead and build.
1:26:13
Let's change the name to name it Model 14.
1:26:19
So we'll start building the 3D models for each of them now.
1:26:23
OK, I have only 15 minutes, so this is actually quite fast.
1:26:27
I thought I would let it run, but I'm just gonna cancel it now.
1:26:31
And I have this models already generated.
1:26:36
It's the same settings.
1:26:37
It started from the same IG EFV that I used the faster file.
1:26:41
So you have the models same output as any other database.
1:26:44
Now we'll move on to the next step.
1:26:46
So we have the 3D structures and the models for each of these sequences.
1:26:50
Now we'll analyse them for which ones could be the possibly good sequences.
1:26:56
So first I will save this database so I can actually do the modifications to it.
1:27:00
This is the read only and justice go ahead and calculate the properties.
1:27:04
So under compute descriptors, you have the protein properties, which is the same protein properties that we saw earlier.
1:27:14
So again, the all the 28 selected just the basic ones.
1:27:18
I will calculate for all of these, just I need to make sure it's the correct field in the settings.
1:27:24
So the model or the 3D structure is in the model field.
1:27:28
So it's set to model and I can just go ahead and calculate.
1:27:32
Once it's done, it's shipped to the viewer tab instantly and I can look at the properties for each of these models in the panel or I'm just gonna close it.
1:27:41
I have the properties in the database as well.
1:27:43
Now there's sorry is no.
1:27:45
Are you confused about something?
1:27:47
OK, Did not calculate that quickly.
1:27:49
It was pre calculated.
1:27:52
It was pre calculated.
1:27:54
I mean, it still takes like maybe 30 seconds or so.
1:27:57
Yeah, but it's still like pretty fast for these 14.
1:28:00
I should control my face.
1:28:04
Yeah, it's like 30 seconds for these 14 sequences.
1:28:07
Pretty quick.
1:28:07
Yeah, Yeah, pretty quick.
1:28:09
So if the property is in here now, I could look at any of the properties depending on what I am targeting at.
1:28:16
Again, in this case, let's go with the solubility since we have been going with the solubility workflow for so long.
1:28:22
So I'm just gonna go to the patch Cdr hydrophobic directly and I'm going to sort it in an ascending order to see.
1:28:31
Yes.
1:28:35
Is that correct?
1:28:36
Yeah, it was a bad Cdr hydrophobic instead of ascending, I'm going to sort it in a descending order, let's say.
1:28:42
So I have the bad ones on the top and I have the good ones on the bottom.
1:28:47
And you can see all the sequences are actually better than the wild type that sequence that they started with.
1:28:55
We can also do further analysis just from the values.
1:28:58
I do have an idea that these are the ones which are more would be more soluble in when we are developing these antibodies than the original wild type.
1:29:08
But we can analyse this more and have a physical explanation in 3D since we do have the models present.
1:29:15
So I'll just select the 1st and the last one and send these selected to more.
1:29:20
So one of the best ways and the quickest ways to identify is by using the protein patches which we calculated earlier.
1:29:27
So just the significant and the continuous regions on the surface of protein.
1:29:32
The green is for excess hydrophobicity, then you have the red and blue for excess negative and the positive charges respectively.
1:29:39
Now in this case, let me show 1 by 1.
1:29:41
So that is the first one, which is the wild type.
1:29:44
You can see there is a big hydrophobic patch around the CDR's.
1:29:48
If I bring that the second one up, which was the last one in the list with the lower hydrophobic area, you can see that big hydrophobic patch is missing in this structure.
1:29:59
I can also see where the differences are coming from.
1:30:02
So in the sequence editor, let's say I just coloured the residues by identity.
1:30:07
So now I can see which are the residues which are different in these two sequences.
1:30:12
We have one residue here by lean to arginine and then there are also two residues on the heavy chain.
1:30:20
In this case, I can just select these as well or I could also look at the patch analyzer.
1:30:25
If I look at the patch analyzer on the top one, if I select these residues, you can see it's actually one of the residues which is the part of the hydrophobic patch.
1:30:35
So the mutation from valine to arginine eliminated or reduced the hydrophobic patch for the second antibody that we have in here.
1:30:47
So this is the cause of better solubility and a better developability.
1:30:51
So we also pinpointed where the difference is coming from.
1:30:54
I could also show these residues in the main move window.
1:30:58
That's just the same drill as we have for just the rendering.
1:31:01
Now here, I'm just gonna show you another part where you can also compare just the two structures.
1:31:07
So if I click on 2D maps, let me make it bigger.
1:31:12
So this just gives me the maps or the world of the protein on 2D.
1:31:17
So you have the same colouring, the green for hydrophobic and the red for negative, blue for positive.
1:31:23
Here we can see for this structure.
1:31:25
For now, I have the second one in the view.
1:31:27
So I have to bring the other one.
1:31:29
There's too many open.
1:31:30
Let me close this close this.
1:31:35
I didn't have the other one in the view, so it would not give me the patches for that, but now it will.
1:31:41
So at this point.
1:31:43
So this is the wild type with the big hydrophobic patch in the centre that we also saw.
1:31:48
And the axis here is just as we have oriented in the main move window.
1:31:52
So the hydrophobic patch on the Cdr that I was seeing is the patch #1 The numbering here is on the basis of the area as we see in the patch analyzer.
1:32:02
I can go to the next one and you can quickly see that the big patch is gone, but I could also see them side by side.
1:32:10
So just makes it easy for comparison.
1:32:14
It's the same 1.
1:32:15
So let me yeah, this is not that one now this is just giving a comparison, but we could also calculate the differences.
1:32:25
So I just toggled on the map difference and now it's 21.
1:32:29
That means the patches which are in darker colours are the added patches and the ones in lighter colours are the removed ones.
1:32:36
So from the mutant to the wild type you can see in wild type this big patch has been added.
1:32:42
Or you could say that big patch has been removed in the mutant structure.
1:32:46
So this is showing the difference between the patches and this protein patch calculation can also be done on a series of structures and you could also calculate average patch when you have them in the database.
1:32:57
This was just in the main move window.
1:33:02
So So what we did is we just calculate developed the 3D models, calculated the properties and just on basis of one property.
1:33:09
At this point I calculate did the solubility analysis for all these mutant sequences and picked out which one would be the good sequences with a better solubility.
1:33:20
Now at this point you might have realised that why did we calculate the models or for most of the protein properties, you know they are based on the 3D model.
1:33:28
So if we do want to do the analysis, the first step definitely goes through doing the homology modelling and getting these 3D structures that we can analyse at the end.
1:33:38
So that was the last section.
1:33:40
Let me just give a little summary.
1:33:44
So we had the starting with the crystal structure and analysis and also just identifying why we have bad properties and trying to fix those properties using virtual mutagenesis.
1:33:58
Then we looked at the other scenario where we have a sequence.
1:34:02
So we built the homology model for one sequence, identified another liability, which is the glycosylation side, and removed it using the statistics from the antibody database.
1:34:13
Finally, we did the batch homology modelling for a bunch of sequences just to figure out which might be the good mutant sequences and which we have as compared to the wild type.
1:34:24
Thank you very much.
1:34:24
Thank you for listening and your patience.
1:34:27
If you have any questions, feel free to ask now.