scispace - formally typeset
Open AccessJournal ArticleDOI

Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes

TLDR
Enrichment results demonstrate the importance of the novel XP molecular recognition and water scoring in separating active and inactive ligands and avoiding false positives.
Abstract
A novel scoring function to estimate protein-ligand binding affinities has been developed and implemented as the Glide 4.0 XP scoring function and docking protocol. In addition to unique water desolvation energy terms, protein-ligand structural motifs leading to enhanced binding affinity are included: (1) hydrophobic enclosure where groups of lipophilic ligand atoms are enclosed on opposite faces by lipophilic protein atoms, (2) neutral-neutral single or correlated hydrogen bonds in a hydrophobically enclosed environment, and (3) five categories of charged-charged hydrogen bonds. The XP scoring function and docking protocol have been developed to reproduce experimental binding affinities for a set of 198 complexes (RMSDs of 2.26 and 1.73 kcal/mol over all and well-docked ligands, respectively) and to yield quality enrichments for a set of fifteen screens of pharmaceutical importance. Enrichment results demonstrate the importance of the novel XP molecular recognition and water scoring in separating active and inactive ligands and avoiding false positives.

read more

Content maybe subject to copyright    Report

Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure
for Protein-Ligand Complexes
Richard A. Friesner,* Robert B. Murphy,
Matthew P. Repasky,
Leah L. Frye,
Jeremy R. Greenwood,
Thomas A. Halgren,
Paul C. Sanschagrin,
and Daniel T. Mainz
Department of Chemistry, Columbia UniVersity, New York, New York 10027, Schro¨dinger, Limited Liability Company, 120 West 45th Street,
New York, New York 10036, Schro¨dinger, Limited Liability Company, 101 SW Main Street, Portland, Oregon 97204
ReceiVed December 16, 2005
A novel scoring function to estimate protein-ligand binding affinities has been developed and implemented
as the Glide 4.0 XP scoring function and docking protocol. In addition to unique water desolvation energy
terms, protein-ligand structural motifs leading to enhanced binding affinity are included: (1) hydrophobic
enclosure where groups of lipophilic ligand atoms are enclosed on opposite faces by lipophilic protein
atoms, (2) neutral-neutral single or correlated hydrogen bonds in a hydrophobically enclosed environment,
and (3) five categories of charged-charged hydrogen bonds. The XP scoring function and docking protocol
have been developed to reproduce experimental binding affinities for a set of 198 complexes (RMSDs of
2.26 and 1.73 kcal/mol over all and well-docked ligands, respectively) and to yield quality enrichments for
a set of fifteen screens of pharmaceutical importance. Enrichment results demonstrate the importance of the
novel XP molecular recognition and water scoring in separating active and inactive ligands and avoiding
false positives.
1. Introduction
In two previous papers
1,2
we have described the Glide high
throughput docking program and provided performance bench-
marks for docking and scoring capabilities. These results have
established Glide as a competitive methodology in both areas.
2-5
However, it is clear from enrichment results (ref 2) that there
remains substantial room for improvement in separating “active”
from “inactive” compounds. In this paper we outline and present
results obtained from significantly enhanced sampling methods
and scoring functions, hereafter collectively referred to as “extra-
precision” (XP) Glide. The key novel features characterizing
XP Glide scoring are (1) the application of large desolvation
penalties to both ligand and protein polar and charged groups
in appropriate cases and (2) the identification of specific
structural motifs that provide exceptionally large contributions
to enhanced binding affinity. Accurate assignment of these
desolvation penalties and molecular recognition motifs requires
an expanded sampling methodology for optimal performance.
Thus, XP Glide represents a single, coherent approach in which
the sampling algorithms and the scoring function have been
optimized simultaneously.
The goal of the XP Glide methodology is to semiquantita-
tively rank the ability of candidate ligands to bind to a specified
conformation of the protein receptor. Because of the rigid
receptor approximation utilized in Glide and other high through-
put docking programs, ligands that exhibit significant steric
clashes with the specified receptor conformation cannot be
expected to achieve good scores, even if they in reality bind
effectively to an alternative conformation of the same receptor.
Such ligands may be thought of as unable to “fit” into that
specified conformation of the protein. For docking protocols to
function effectively within the rigid-receptor approximation,
some ability to deviate from the restrictions of the hard wall
van der Waals potential of the receptor conformation used in
docking must be built into the potential energy function
employed to predict the ligand binding mode. In XP and SP
Glide, this is accomplished by scaling the van der Waals radii
of nonpolar protein and/or ligand atoms; scaling the vdW radii
effectively introduces a modest “induced fit” effect. However,
it is clear that there are many cases in which a reasonable degree
of scaling will not enable the ligand to be docked correctly.
For example, a side chain in a rotamer state that is very different
from that of the native protein-ligand complex may block the
ligand atoms from occupying their preferred location in the
binding pocket. There will always be borderline situations, but
in practice we have found it possible to classify the great
majority of cases in cross-docking experiments as either “fitting”
or “not fitting”. The former are expected to be properly ranked
by XP Glide (within the limitation of noise in the scoring
function), while the latter require an induced-fit protocol
6,7
to
correctly assess their binding affinity.
4
In the present paper, we
focus on complexes where the ligand fits appropriately into the
receptor, as judged by two factors: (1) the ability to make key
hydrogen bonding and hydrophobic contacts and (2) the ability
to achieve a reasonable root-mean-square deviation (RMSD),
as compared to the native complex or as obtained by analogy
with the native complex of a related ligand. Comparison by
analogy is often necessary when dealing with a large dataset of
active ligands, only a few of which may have available crystal
structures.
Our discussion of XP Glide is divided into four different
sections. First, in section 2, we describe the novel terms leading
to enhanced binding affinity that have been introduced to
account for our observations with regard to protein-ligand
binding in a wide range of systems. The origin of these terms
lies in the theoretical physical chemistry of protein-ligand
interactions; however, developing heuristic mathematical rep-
resentations that can be used effectively in an empirical scoring
function, taking into account imperfections in structures due to
the rigid receptor approximation and/or limitations of the
docking algorithm, requires extensive analysis of, and fitting
* To whom correspondence should be addressed. Phone: 212-854-7606.
Fax: 212-854-7454. E-mail: rich@chem.columbia.edu.
Schro¨dinger, L.L.C., NY.
Schro¨dinger, L.L.C., OR.
6177J. Med. Chem. 2006, 49, 6177-6196
10.1021/jm051256o CCC: $33.50 © 2006 American Chemical Society
Published on Web 09/23/2006

to, experimental data. Key aspects of this analysis, along with
illustrative examples, are provided in section 2 in an effort to
provide physical insight as well as formal justification for the
model. In developing XP Glide, we have attempted to identify
the principal driving forces and structural motifs for achieving
significant binding affinity contributions with specific protein-
ligand interactions, above and beyond the generic terms that
have appeared repeatedly in prior scoring functions. We have
found that a relatively small number of such motifs are dominant
over a wide range of test cases; the ability to automatically
recognize these motifs, and assign binding affinity contributions,
potentially represents an advance in the modeling of protein-
ligand interactions based on an empirical scheme.
In section 3, we evaluate the performance of our methodology
in self-docking, with regard to both the ability to generate the
correct binding mode of the complex and the prediction of
binding affinity, using docked XP structures for the complexes.
In section 4, the performance of the scoring function in
enrichment studies (ability to rank known active compounds
ahead of random database ligands) for a substantial number of
targets, containing qualitatively different types of active sites,
is investigated. Our treatment of the data differs significantly
from what has generally prevailed in previous papers in the
literature; in evaluating scoring accuracy, we distinguish cases
where there are significant errors in structural prediction, as
opposed to systems where the structural prediction is reasonably
good, but the scoring function fails to assign the appropriate
binding affinity. By using only well-docked structures to
parametrize and assess scoring functions, a way forward toward
a globally accurate method, in which multiple structures are
employed in docking and/or induced fit methods are utilized to
directly incorporate protein flexibility, is facilitated.
The parameterization of XP Glide is carried out using a large
and diverse training set comprising 15 different receptors and
between 4 and 106 well-docked ligands per receptor. A
separately developed test set incorporating four new receptors,
and additional ligands for two receptors already in the training
set, is also defined. All of the receptor and ligand data is publicly
available (as is our decoy set, which has been posted on the
Schrodinger Web site and is freely available for downloading)
and we provide extensive references documenting the origin of
each ligand. The results reported below have been obtained with
the Glide 4.0 release.
The development of data sets suitable for the analysis
described above is highly labor intensive; consequently, our
current test set is too small to draw robust conclusions, and the
results reported herein must be regarded as preliminary. While
the test set results are encouraging with regard to demonstration
of a respectable degree of transferability, a rigorous assessment
of the performance to be expected on a novel receptor will have
to be performed in future publications. Nevertheless, qualitative
and consistent improvement in the results for both training and
test set, at least as compared to the alternative scoring functions
available in Glide, is demonstrated. Finally, in the conclusion,
we summarize our results and discuss future directions.
2. Glide XP Scoring Function
The major potential contributors to protein-ligand binding
affinity can readily be enumerated as follows:
(1) Displacement of Waters by the Ligand from “Hydro-
phobic Regions” of the Protein Active Site. Displacement of
these waters into the bulk by a suitably designed ligand group
will lower the overall free energy of the system. Waters in such
regions may not be able to make the full complement of
hydrogen bonds that would be available in solution. There are
also entropic considerations; if a water molecule is restricted
in mobility in the protein cavity, release into solvent via ligand-
induced displacement will result in an entropy gain. As one
ligand releases many water molecules, this term will contribute
favorably to the free energy. Replacement of a water molecule
by a hydrophobic group of the ligand retains favorable van der
Waals interactions, while eliminating issues concerning the
availability of hydrogen bonds. Transfer of a hydrophobic
moiety on the ligand from solvent exposure to a hydrophobic
pocket can also contribute favorably to binding by withdrawing
said hydrophobic group from the bulk solution.
(2) Protein-Ligand Hydrogen-Bonding Interactions, as
well as Other Strong Electrostatic Interactions Such as Salt
Bridges. In making these interactions, the ligand displaces
waters in the protein cavity, which can lead to favorable entropic
terms of the type discussed above in (1). Contributions to
binding affinity (favorable or unfavorable) will also depend on
the quality and type of hydrogen bonds formed, net electrostatic
interaction energies (possibly including long range effects,
although these generally are considered small and typically are
neglected in empirical scoring functions), and specialized
features of the hydrogen-bonding geometry, such as bidendate
salt bridge formation by groups such as carboxylates or
guanidium ions. Finally, differences in the interactions of the
displaced waters, as compared to the ligand groups replacing
them, with the protein environment proximate to the hydrogen
bond, can have a major effect on binding affinity, as is discussed
in greater detail below.
(3) Desolvation Effects. Polar or charged groups of either
the ligand or protein that formerly were exposed to solvent may
become desolvated by being placed in contact with groups to
which they cannot hydrogen bond effectively. In contrast to the
two terms described above, such effects can only reduce binding
affinity.
(4) Entropic Effects Due to the Restriction on Binding of
the Motion of Flexible Protein or Ligand Groups. The largest
contributions are due to restriction of ligand translational/
orientational motion and protein and ligand torsions, but
modification of vibrational entropies can also contribute. As in
the case of desolvation terms, such effects will serve exclusively
to reduce binding affinity.
(5) Metal-Ligand Interactions. Specialized terms are
needed to describe the interaction of the ligand with metal ions.
We shall defer the discussion of metal-specific parameteriza-
tion to another publication, as this is a complex subject in its own
right, requiring considerable effort to treat in a robust fashion.
A large number of empirical scoring functions for predicting
protein-ligand binding affinities have been developed.
8-19
While differing somewhat in detail, these scoring functions are
broadly similar. A representative example, the ChemScore
8
scoring function, is discussed in our comments below, though
similar comments would apply to many of the other scoring
functions cited in refs 8-19. We briefly summarize how
ChemScore treats the first four potential contributors to the
binding affinity presented above:
(1) ChemScore
8
contains a hydrophobic atom-atom pair
energy term of the form
Here, i and j refer to lipophilic atoms, generally carbon, and
f(r
ij
) is a linear function of the interatomic distance, r
ij
. For r
ij
less than the sum of the atomic vdW radii plus 0.5 Å, f is 1.0.
E
phobicpair
)
ij
f(r
ij
) (1)
6178 Journal of Medicinal Chemistry, 2006, Vol. 49, No. 21 Friesner et al.

Between this value and the sum of atomic vdW radii plus 3.0
Å, f ramps linearly from 1.0 to zero. Beyond the sum of atomic
vdW radii plus 3.0 Å, f is assigned a value of zero.
This term heuristically represents the displacement of waters
from hydrophobic regions by lipophilic ligand atoms. Numerous
close contacts between the lipophilic ligand and protein atoms
indicate that poorly solvated waters have been displaced by
lipophilic atoms of the ligand that themselves were previously
exposed to water. The resulting segregation of lipophilic atoms,
and concomitant release of waters from the active site, lowers
the free energy via the hydrophobic effect, which is ap-
proximately captured by the pair scoring function above. Terms
based on contact of the hydrophobic surface area of the protein
and ligand, while differing in details, essentially measure the
same free energy change and have a similar physical and
mathematical basis.
Various parameterizations of the atom-atom pair term have
been attempted, including efforts such as PLP,
9
in which every
pair of atom types is assigned a different empirical pair potential.
However, it is unclear whether this more detailed parameteriza-
tion yields increased accuracy in predicting binding affinities.
A key issue is whether a correct description of the hydrophobic
effect can be achieved in all cases by using a linearly additive,
pairwise decomposable functional form.
(2) ChemScore evaluates protein-ligand hydrogen-bond
quality based on geometric criteria, but otherwise does not
distinguish between different types of hydrogen bonds or among
the differing protein environments in which those hydrogen
bonds are embedded.
(3) ChemScore does not treat desolvation effects.
(4) ChemScore uses a simple rotatable-bond term to treat
conformation entropy effects arising from restricted motion of
the ligand.
The new XP Glide scoring function starts from the “standard”
terms discussed above, though the functional form of the first
three terms have been significantly revised and the parameteriza-
tion of all terms is specific to our scoring function. In the remain-
der of this section, the functional form and physical rationale
for the novel scoring terms we have developed are described
with examples from pharmaceutically relevant test cases pro-
vided to illustrate how the various terms arise from consideration
of the underlying physical theory and experimental data.
Form of the XP Glide Scoring Function. The XP Glide
scoring function is presented in eq 2. The principal terms that
favor binding are presented in eq 3, while those that hinder
binding are presented in eq 4. A description of each of the
following terms besides E
hbpair
and E
phobicpair
, which are
standard ChemScore-like hydrogen bond and lipophilic pair
terms, respectively, follows.
Improved Model of Hydrophobic Interactions: Hydro-
phobic Enclosure (E
hydenclosure
). The ChemScore atom-atom
pair function, E
phobicpair
described above, assigns scores to
lipophilic ligand atoms based on summation over a pair function,
each term of which depends on the interatomic distance between
a ligand atom and a neighboring lipophilic protein atom. This
clearly captures a significant component of the physics of the
hydrophobic component of ligand binding. It is assumed that
the displacement of water molecules from areas with many
proximal lipophilic protein atoms will result in lower free energy
than displacement from areas with fewer such atoms. As a crude
example, it is clear that if the ligand is placed in an active-site
cavity, as opposed to on the surface of the protein, the lipophilic
atoms of the ligand are likely to receive better scores. If they
are located in a “hydrophobic pocket” of the protein, scores
should be better than in a location surrounded primarily by polar
or charged groups. Furthermore, these improved scores are likely
to be correlated with improvements in ligand binding affinity.
However, a function dependent only on the sum of interatomic
pair functions is potentially inadequately sensitive to details of
the local geometry of the lipophilic protein atoms relative to
the ligand lipophilic atom in question. As an example, consider
the two model distributions shown in Figure 1. In one case (A),
a lipophilic ligand group is placed at a hydrophobic “wall” with
lipophilic protein atoms on only a single face of the hydrophobic
group. In the second case (B), the lipophilic ligand group is
placed into a tight pocket, with lipophilic protein atoms
contacting the two faces of the ligand group. As suggested
above, one would normally expect a larger contribution to
binding in the second case than in the first. However, this does
not fully settle the question, which at root is whether the atom-
atom pair contribution for a given ligand-atom/protein-atom
distance should be identical when the ligand atom is enclosed
by protein hydrophobic atoms, as opposed to when it is not, or
whether there can be expected to be nonadditive effects.
From a rigorous point of view, the answer depends principally
upon the free energy to be gained by displacing a water molecule
at a given location. This in turn depends on how successfully
that water molecule is able to satisfy its hydrogen-bonding
requirements at that location, while retaining orientational
flexibility. In the extreme case in which a single water molecule
is placed in a protein cavity that can accommodate only one
water molecule and is surrounded on all sides by lipophilic
atoms that cannot make hydrogen bonds, the enthalpy gain of
transferring the water to bulk solution is enormously favorable.
In such a case it is not clear that a water molecule would occupy
such a cavity in preference to leaving a vacuum, despite the
statistical terms favoring occupancy. However, this is a rare
situation not particularly relevant to the binding of a large ligand,
whereas structural motifs similar to the examples in Figure 1
are quite common.
There have been a large number of papers in the literature
studying, via molecular dynamics simulations, the behavior of
XP GlideScore ) E
coul
+ E
vdW
+ E
bind
+ E
penalty
(2)
E
bind
) E
hydenclosure
+ E
hbnnmotif
+ E
hbccmotif
+ E
PI
+
E
hbpair
+ E
phobicpair
(3)
E
penalty
) E
desolv
+ E
ligandstrain
(4)
Figure 1. Schematic of a ligand group interacting with two distinct
hydrophobic environments: above a hydrophobic “plane” (A) and
enclosed in a hydrophobic cavity (B).
XP Glide Methodology and Application Journal of Medicinal Chemistry, 2006, Vol. 49, No. 21 6179

water in contact with various types of hydrophobic structures,
including flat and curved surfaces,
20-22
parallel plates,
23,24
nanotubes,
25
and recently more realistic systems such as the
hydrophobic surfaces of a protein or the interface between two
protein domains.
26-28
There have also been attempts to develop
general theories as to how the hydrophobic effect depends on
the size and shape of the hydrophobic structure presented to
the water molecules.
25,29
A number of concepts that are clearly related to the proposals
in the present paper have emerged from this work: evacuation
of water (dewetting), under the appropriate conditions, from
regions between two predominantly hydrophobic surfaces
1,2,9
and a model for the curvature dependence of the hydrophobic
energy in which concave regions are argued to have greater
hydrophobicity than convex ones.
29
However, while this work
provides useful ideas and general background, development of
a scoring function that can be used to quantitatively predict
protein-ligand binding in the highly heterogeneous and complex
environment of a protein active site requires direct engagement
with a critical mass of experimental data as well as extensive
parameterization and investigation of a variety of specific
functional forms. In what follows, we describe the results of
our investigations along these lines.
A large number of computational experiments involving
modifications of the hydrophobic scoring term designed to
discriminate between different geometrical protein environments
have been performed. The criterion for success in these
experiments is the ability of any proposed new term to fit a
wide range of experimental binding free energy data and yield
good predictions in enrichment studies. Key findings are
summarized as follows:
(1) Ligand hydrophobic atoms must be considered in groups,
as opposed to individually. The free energy of water molecules
in the protein cavity is adversely affected beyond the norm
primarily when placed in an enclosed hydrophobic microenvi-
ronment that extends over the dimension of several atoms. If
there are individual isolated hydrophobic contacts, the water
will typically be able to make its complement of hydrogen bonds
anyway by partnering with neighboring waters as in clathrate
structures surrounding small hydrocarbons in water.
28
After
empirical experimentation, the minimum group size of connected
ligand lipophilic atoms has been set at three.
(2) When a group of lipophilic ligand atoms is enclosed on
two sides (at a 180 degree angle) by lipophilic protein atoms,
this type of structure contributes to the binding free energy
beyond what is encoded in the atom-atom pair term. We refer
to this situation as hydrophobic enclosure of the ligand. There
is some analogy here to the parallel plate, nanotube (with some
sets of parameters), and protein systems in which dewetting has
been observed, although the length scale of the region under
consideration is smaller and (likely) more heterogeneous. The
pair hydrophobic term in eq 1 is generally fit to data from a
wide range of experimental protein-ligand complexes. As such,
it represents the behavior of individual lipophilic ligand atoms
in an “average” environment. Our new terms utilize specific
molecular recognition motifs and are designed to capture
deviations from this average that lead to substantial increases
in potency for lipophilic ligand groups of types that are typically
targeted in medicinal chemistry optimization programs. That
is, placing an appropriate hydrophobic ligand group within the
specified protein region leads to substantial increases in potency.
Indeed, the data enabling development of this term was primarily
obtained from a wide range of published medicinal chemistry
efforts that provided examples of lipophilic groups that yielded
exceptional increases in potency, as well as those yielding
minimal increases. Our objective has been to explain these
results on the basis of physical chemical principles and to
develop empirical scoring terms that captured the essential
physics while rejecting false positives, even with imperfect
docking and the neglect of induced fit effects.
Calculation of the hydrophobic enclosure score, E
hydenclosure
,
is summarized below with a more detailed description of the
algorithm provided in Supporting Information:
(1) Lipophilic protein atoms near the surface of the active
site and lipophilic ligand atoms are divided into connected
groups. There are a set of rules specifying which atoms count
as lipophilic and what delimits a group.
(2) For each atom in a group on the ligand, lipophilic protein
atoms are enumerated at various distances.
(3) For each lipophilic ligand atom, the closest lipophilic
protein atom is selected and a vector is drawn between it and
the ligand atom. This is the protein “anchor” atom for that ligand
atom. Vectors for all other suitably close lipophilic protein atoms
are drawn to the ligand atom and their angles with the anchor-
atom vector are determined. To be considered on the “opposite
side” of the anchor atom, the angle between vectors must exceed
a cutoff value that depends on the pair distance, with shorter
distances requiring that the angle be closer to 180°. If the angle
is close to zero degrees, the atom is on the “same side”, and is
at right angles to the anchor if the angle is close to 90°. When
the angle between lipophilic protein atoms is close to 180°,we
have argued this leads to an especially poor environment for
waters.
(4) Each lipophilic ligand atom is assigned a score based on
the number of total lipophilic contacts with protein atoms,
weighted by the angle term. If no protein atom is greater than
90 degrees from the anchor atom, the angle term is zero and
the atom contributes zero to the group’s E
hydenclosure
term. The
overall score for a group is the sum over all atoms in that group
of the product of the angular factor and a distance dependent
factor.
(5) If the score for any ligand group is greater than 4.5 kcal/
mol, the penalty is capped at 4.5 kcal/mol. This was an empirical
determination based on investigating many test cases and
comparing the results with experimental data. The capping is
rationalized by arguing that if a very large region of this type
leads to a score greater than 4.5 kcal/mol, there is probably
some ability of the water molecules to compensate by interacting
with each other.
An experimentally validated example of the gain in binding
affinity from placing a large hydrophobic group in a pocket in
which lipophilic protein atoms are present on both sides of the
pocket (rings in both cases) is shown in Figure 2. Here, replacing
a phenyl substituent with a naphthyl group was shown
30
to result
in a 21-fold improvement in experimentally measured affinity
(K
d
). The naphthyl is required to fully occupy the hydrophobic
pocket depicted in Figure 2.
As indicated above, the surrounding of ligand lipophilic atoms
or groups by lipophilic protein atoms is referred to as hydro-
phobic enclosure. Our contention, here and in much of the
following discussion of hydrogen bonding, is that proper
treatment of hydrophobic enclosure is the key to discrimination
of highly and weakly potent binding motifs and compounds.
The underlying mathematical framework for describing enclo-
sure, discussed above, could be cast in other forms, but the
essential idea would remain unchanged. Detailed optimization
of the numerical criteria for recognizing enclosure, and assigning
6180 Journal of Medicinal Chemistry, 2006, Vol. 49, No. 21 Friesner et al.

a specific contribution to the binding affinity for each motif is
vital to developing methods with predictive capability.
Improved Model of Protein-Ligand Hydrogen Bonding.
In developing a refined model of hydrogen bonding, we divide
hydrogen bonds into three types, neutral-neutral, neutral-
charged, and charged-charged. The analysis of each type of
hydrogen bonding is different due to issues associated with the
long-range solvation energy (Born energy) of charged groups.
An initial step is to assign different default values (assuming
optimal geometric features) to each of the three types of
hydrogen bonds. The default values assigned are neutral-
neutral, 1.0 kcal/mol, neutral-charged, 0.5 kcal/mol, and
charged-charged, 0.0 kcal/mol. These assignments are based
on a combination of physical reasoning and empirical observa-
tion from fitting to reported binding affinities of a wide range
of PDB complexes.
The rationale for rewarding protein-ligand hydrogen bonds
at all is subtle, because any such hydrogen bonds are replacing
hydrogen bonds that the protein and ligand make with water.
At best the net number of total hydrogen bonds on average will
remain the same in the bound complex as compared to solution.
However, the liberation of waters to the bulk can be argued to
result in an increase in entropy, and liberation of waters around
a polar protein group requires that a protein-ligand hydrogen
bond with similar strength be made for a desolvation penalty
to be avoided. This analysis is most plausible when both groups
are neutral. The formation of a salt bridge between protein and
ligand involves very different types of hydrogen bonding from
what is found in solution. The thermodynamics of salt bridge
formation in proteins has been studied extensively, both
theoretically and experimentally,
31-34
and depends on many
factors such as the degree of solvent exposure of the groups
involved in the salt bridge. The default value of zero that we
assign is based on the presence of many protein-ligand
complexes in the PDB with very low binding affinities in which
solvent-exposed protein-ligand salt bridges are formed. As-
signing the contributions of these salt bridges to the binding
affinity would lead to systematically worse agreement with
experimental enrichment data. In XP scoring, certain features
of a salt bridge are required for this type of structure to
contribute to binding affinity in XP scoring. Finally, the
charged-neutral default value represents an interpolation
between the neutral-neutral and charged-charged value that
appears to be consistent with the empirical data.
Hydrogen-bond scores are diminished from their default
values as the geometry deviates from an ideal hydrogen-bonding
geometry, based on both the angles between the donor and
acceptor atoms and the distance. The function that we use to
evaluate quality is similar to that used in ChemScore.
In what follows, specialized hydrogen-bonding motifs are
described in which additional increments of binding affinity are
assigned in addition to those from the ChemScore-like pairwise
hydrogen-bond term. Our investigations indicate that these
situations can arise for neutral-neutral or charged-charged
hydrogen bonds, but not for charged-neutral hydrogen bonds.
The exclusion of charged-neutral hydrogen-bond special
rewards has principally been driven by our failure to date to
identify motifs of this type that help to improve the agreement
with experimental data. One can speculate that the lack of charge
complementarity in charged-neutral hydrogen bonding pre-
cludes such structures from being major molecular recognition
motifs, though further investigations with larger data sets will
be needed to resolve this issue.
Special Neutral-Neutral Hydrogen-Bond Motifs
(E
hbnnmotif
). In this section, neutral-neutral hydrogen-bonding
motifs are described that were identified, based on both
theoretical and empirical considerations, as making exceptional
contributions to binding affinity. Such “special” hydrogen bonds
represent key molecular recognition motifs that are found in
many if not most pharmaceutical targets. Targeting such motifs
is a central strategy in increasing the potency and specificity of
medicinal compounds. Identifying such motifs through their
incorporation in the scoring function should enable a dramatic
improvement in both qualitative and quantitative predictions.
The critical idea in our recognition of special hydrogen bonds
is to locate positions in the active-site cavity at which a water
molecule forming a hydrogen bond to the protein would have
particular difficulty in making its complement of additional
hydrogen bonds. Forming such a hydrogen bond imposes
nontrivial geometrical constraints on the water molecule. This
is the basis for the default hydrogen-bond score, but such
constraints become more problematic when the environment of
the water molecule is challenging with respect to making
additional hydrogen bonds such as those found in the bulk
environment.
Our previous analysis of hydrophobic interactions suggests
that the environment will be significantly more challenging if
the water molecule has hydrophobic protein atoms on two faces,
as opposed to a single face, and if few neighboring waters are
available to readjust themselves to the constrained geometry of
the protein-water hydrogen bond. Geometries of this type are
identified using a modified version of the hydrophobic enclosure
detection algorithm described previously. Replacement of such
water molecules by the ligand will be particularly favorable if
the donor or acceptor atom of the ligand achieves its full
complement of hydrogen bonds by making the single targeted
hydrogen bond with the protein group in question so that
satisfaction of additional hydrogen bonds is not an issue. An
example of a suitable group would be a planar nitrogen in an
aromatic ring binding for example to a protein N-H backbone
group. This has been observed to be essential to achieving high
potency experimentally in the 1bl7 ligand binding to p38 MAP
kinase, as shown in Figure 3. Here, the Met 109 hydrogen bond
is known to be important for potency. Analogous hydrogen
bonds have been found to be important in other kinases. In the
absence of rigorous physical chemical simulations, we have used
the experimental data from a significant number of diverse
protein-ligand complexes to guide the development of a set of
empirical rules, outlined below, for the types of ligand and
Figure 2. Boehringer active for 1kv2 bound to human p38 map kinase.
The naphthyl group receives a -4.5 kcal/mol hydrophobic enclosure
packing reward.
XP Glide Methodology and Application Journal of Medicinal Chemistry, 2006, Vol. 49, No. 21 6181

Citations
More filters
Journal ArticleDOI

Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments

TL;DR: It is shown that database enrichment is improved with proper preparation and that neglecting certain steps of the preparation process produces a systematic degradation in enrichments, which can be large for some targets.
Journal ArticleDOI

Water as an Active Constituent in Cell Biology

Philip Ball
- 01 Jan 2008 - 
TL;DR: The recent confirmation that there is at least one world rich in organic molecules on which rivers and perhaps shallow seas or bogs are filled with nonaqueous fluidsthe liquid hydrocarbons of Titan now bring some focus, even urgency, to the question of whether water is indeed a matrix of life.
Journal ArticleDOI

Identifying and characterizing binding sites and assessing druggability.

TL;DR: A new program, called SiteMap, is presented for identifying and analyzing binding sites and for predicting target druggability, which provides quantitative and graphical information that can help guide efforts to critically assess virtual hits in a lead-discovery application or to modify ligand structure to enhance potency or improve physical properties in aLead-optimization context.
Journal ArticleDOI

Insights into Protein-Ligand Interactions: Mechanisms, Models, and Methods

TL;DR: The physicochemical mechanisms underlying protein–ligand binding, including the binding kinetics, thermodynamic concepts and relationships, and binding driving forces, are introduced and rationalized.
Journal ArticleDOI

Motifs for molecular recognition exploiting hydrophobic enclosure in protein–ligand binding

TL;DR: The authors' simulations and analysis indicate that the solvation of protein active sites that are characterized by hydrophobic enclosure and correlated hydrogen bonds induce atypical entropic and enthalpic penalties of hydration, which apparently stabilize the protein–ligand complex with respect to the independently solvated ligand and protein, which leads to enhanced binding affinities.
References
More filters
Journal ArticleDOI

Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function

TL;DR: It is shown that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AUTODOCK, and that the Lamarckia genetic algorithm is the most efficient, reliable, and successful of the three.
Journal ArticleDOI

Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy.

TL;DR: Glide approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand to find the best docked pose using a model energy function that combines empirical and force-field-based terms.
Journal ArticleDOI

Development and validation of a genetic algorithm for flexible docking.

TL;DR: GOLD (Genetic Optimisation for Ligand Docking) is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding.
Journal ArticleDOI

Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons.

TL;DR: It is demonstrated in this work that the surface tension, water‐organic solvent, transfer‐free energies and the thermodynamics of melting of linear alkanes provide fundamental insights into the nonpolar driving forces for protein folding and protein binding reactions.
Journal ArticleDOI

Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening.

TL;DR: Comparisons to results for the thymidine kinase and estrogen receptors published by Rognan and co-workers show that Glide 2.5 performs better than GOLD 1.1, FlexX 1.8, or DOCK 4.01.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the important physical effects that oppose binding?

The most important physical effects that oppose binding are strain energy of the ligand, protein, or both, loss of entropy of ligand and protein, and desolvation of the ligand or protein. 

The Glide 4.0 algorithm this paper is a scoring function and docking protocol for protein-ligand binding affinities. 

References to experimental binding affinities for all test and training set ligands are also included. 

The parameterization of XP Glide is carried out using a large and diverse training set comprising 15 different receptors and between 4 and 106 well-docked ligands per receptor. 

until intrinsic RMS fluctuations in the scoring function can be reduced from the present average of 1.7 kcal/mol for well-docked ligands, the scoring function seems unlikely to systematically perform significantly better without overfitting. 

The critical idea in their recognition of special hydrogen bonds is to locate positions in the active-site cavity at which a water molecule forming a hydrogen bond to the protein would have particular difficulty in making its complement of additional hydrogen bonds. 

Because the terms are calculated via fast empirical functions (as opposed to rigorous atomistic simulations), extensive parameterization is required to obtain results in reasonable agreement with experiment. 

By incorporating docked poses of PDB complexes into the optimization process, the penalty function can be tuned to improve the agreement with experimental binding affinities while avoiding inappropriately penalizing active compounds, keeping in mind that there are also cases where the penalty terms are in fact appropriate. 

An enhanced binding affinity for a salt bridge is assigned if the site at which the ligand charge is placed is sufficiently electrostatically favorable. 

Because of the wide range of novel terms that have been incorporated, it has been necessary to perform optimizations using a wide variety of receptors and active compounds. 

A qualitative observation that the authors have made, confirmed in a large number of examples, is that a large hydrophobic enclosure score is a signature of significant protein rearrangement and possibly creation of an allosteric pocket. 

The rationale for rewarding protein-ligand hydrogen bonds at all is subtle, because any such hydrogen bonds are replacing hydrogen bonds that the protein and ligand make with water. 

A large number of computational experiments involving modifications of the hydrophobic scoring term designed to discriminate between different geometrical protein environments have been performed.