.. _api: Python API ========== .. currentmodule:: hydride In summary, *Hydrides* provides two functionalities: - Adding hydrogen atoms to a structure with missing hydrogen atoms - Relaxing hydrogen positions in a structure with existing hydrogen atoms Typically both functionalities are combined: In a first step hydrogen atoms are added to a molecular model and then the hydrogen atoms are relaxed in the resulting model. Note that there are legit cases for using only one of these functionalities: The initial placement of the hydrogen atoms might be sufficient for your use case, so no relaxation of the rotatable hydrogen atoms is required. Or you might have an existing structure containing hydrogen atoms and you want to reduce atom clashes and reconstruct hydrogen bonds. To illustrate the effects of each step, this tutorial will add hydrogen atoms to a molecular model of *2-nitrophenol*, where hydrogen atoms are missing. *Hydride* expands on data types from `Biotite `_, so an :class:`AtomArray` is used to represent a molecular model, including it atom descriptions, coordinates and bonds. If you are new to *Biotite*, have a look at the `official documentation `_. For this example the structure of *2-nitrophenol* is loaded from a *MOL* file. .. code-block:: python import biotite.structure.io.mol as mol mol_file = mol.MOLFile.read("path/to/nitrophenol.mol") molecule = mol_file.get_structure() print(type(molecule)) print() print(molecule) print(molecule.bonds.as_array()) .. code-block:: none 0 O 2.244 0.777 0.012 0 N 1.635 -0.278 0.006 0 O 2.245 -1.333 -0.004 0 C 0.156 -0.278 0.006 0 C -0.539 0.923 0.017 0 O 0.138 2.100 0.029 0 C -0.535 -1.474 -0.012 0 C -1.919 -1.475 -0.013 0 C -2.613 -0.280 0.003 0 C -1.926 0.919 0.016 .. image:: /images/api_01.png :align: center As you can see, this molecule does not contain any hydrogen atoms, yet. This circumstance is changed by :func:`add_hydrogen()`. Note that the input :class:`AtomArray` must not already contain hydrogen atoms. If at some places hydrogen atoms already exist, these must be removed. Furthermore, the :class:`AtomArray` must have an associated ``charge`` attribute, containing the formal charges, and an associated :class:`BondList`. .. code-block:: python import hydride # Remove already present hydrogen atoms (only necessary in rare cases) molecule = molecule[molecule.element != "H"] # Add hydrogen atoms molecule, mask = hydride.add_hydrogen(molecule) print(molecule) print() print(mask) .. code-block:: none 0 O 2.244 0.777 0.012 0 N 1.635 -0.278 0.006 0 O 2.245 -1.333 -0.004 0 C 0.156 -0.278 0.006 0 C -0.539 0.923 0.017 0 O 0.138 2.100 0.029 0 C -0.535 -1.474 -0.012 0 C -1.919 -1.475 -0.013 0 C -2.613 -0.280 0.003 0 C -1.926 0.919 0.016 0 H -0.180 2.650 0.766 0 H -0.034 -2.435 -0.026 0 H -2.419 -2.437 -0.026 0 H -3.696 -0.236 0.006 0 H -2.511 1.832 0.025 [ True True True True True True True True True True False False False False False] .. image:: /images/api_02.png :align: center :func:`add_hydrogen()` returns two objects: The :class:`AtomArray`, including the input atoms plus the added hydrogen atoms, and a boolean mask (a :class:`numpy.ndarray`), that selects the original input atoms. This means, if the mask is applied to the returned :class:`AtomArray`, the hydrogen atoms are removed again. .. code-block:: python print(molecule[mask]) .. code-block:: none 0 O 2.244 0.777 0.012 0 N 1.635 -0.278 0.006 0 O 2.245 -1.333 -0.004 0 C 0.156 -0.278 0.006 0 C -0.539 0.923 0.017 0 O 0.138 2.100 0.029 0 C -0.535 -1.474 -0.012 0 C -1.919 -1.475 -0.013 0 C -2.613 -0.280 0.003 0 C -1.926 0.919 0.016 Regarding the order of atoms in the returned :class:`AtomArray`, the hydrogen atoms are placed behind the heavy atoms for each residue/molecule separately. Most hydrogen atoms are already placed correctly with respect to bond angles and lengths. However, the hydrogen atom in the hydroxy group is not in a energy-minimized state, as an intramolecular hydrogen bond to the nitro group would be expected. The energy minimization is performed with :func:`relax_hydrogen()`. .. code-block:: python molecule.coord = hydride.relax_hydrogen(molecule) .. image:: /images/api_03.png :align: center :func:`relax_hydrogen()` is able to optimize the position of hydrogen atoms at terminal groups. In this case it was able to orient the hydrogen atom at the hydroxy group towards the nitro group to form a hydrogen bond. Custom fragment and atom name libraries --------------------------------------- :func:`add_hydrogen()` uses a :class:`FragmentLibrary` to find the correct number and positions of hydrogen atoms for each heavy atom in the input :class:`AtomArray` and a :class:`AtomNameLibrary` to find the correct atom name (e.g. ``'HA'``). The default fragment library (:meth:`FragmentLibrary.standard_library()`) contains all fragments from the entire `Chemical Component Dictionary `_ and the default atom name library (:meth:`AtomNameLibrary.standard_library()`) contains atom names for all amino acids and standard nucleotides. Furthermore, the atom name library provides a reasonable naming scheme for unknown residues/molecules. Therefore, it is seldom required to provide a custom library. However, in rare cases the fragment library might not contain a required fragment or the automatic atom naming is not sufficient. A custom :class:`FragmentLibrary` can be created either via its constructor or by copying the default library. .. code-block:: python library = copy.deepcopy(hydride.FragmentLibrary.standard_library()) Then molecular models containing hydrogen atoms are added to the library. Internally, a model is split into fragments and these fragments are stored in the library's internal dictionary. Hence, such a molecular model act as *template* for the later hydrogen addition. .. code-block:: python library.add_molecule(template_molecule) Finally the fragment library can be provided to :func:`add_hydrogen()`. .. code-block:: python hydride.add_hydrogen(molecule, fragment_library=library) | In a similar way, a custom :class:`AtomNameLibrary` can be created, filled with template molecules and used in :func:`add_hydrogen()`. .. code-block:: python library = copy.deepcopy(hydride.AtomNameLibrary.standard_library()) library.add_molecule(template_molecule) hydride.add_hydrogen(molecule, name_library=library) Handling periodic boundary conditions ------------------------------------- By default, :func:`add_hydrogen()` and :func:`relax_hydrogen()` do not take periodic boundary conditions into account, as they appear e.g. in MD simulations. Consequently, interactions over the periodic boundary are not taken into account and, more importantly, hydrogen atoms are not placed correctly, if the molecule is divided by the boundary. To tell *Hydride* to consider periodic boundary conditions the `box` parameter needs to be provided. The value can be either a an array of the three box vectors or ``True``, in which case the box is taken from the input structure. .. code-block:: python molecule, _ = hydride.add_hydrogen(molecule, box=True) molecule.coord = hydride.relax_hydrogen(molecule, box=True) Note that this slows down the addition and relaxation procedure. Tweaking relaxation speed and accuracy -------------------------------------- Usually, the relaxation is fast and accurate enough for most applications. However, the user is able to adjust some parameters to shift the speed-accuracy-balance to either side. By default, :func:`relax_hydrogen()` runs until the energy of the conformation does not improve anymore. However, the maximum number of iterations can also be given with the ``iterations`` parameter. If the number of relaxation steps reaches this value, the relaxation terminates regardless of whether an energy minimum is attained. The ``angle_increment`` parameter controls the angle by which each rotatable bond is changed in each relaxation iteration. Lowering this value leads to a higher *resolution* of the returned conformation, but more iterations are required to find the optimum. Observing the relaxation ------------------------ In order to get insights into the course of the relaxation, the user can optionally obtain the coordinates and energy values for each iteration via the ``return_trajectory`` and ``return_energies`` parameters, respectively. .. code-block:: python import matplotlib.pyplot as plt molecule_with_h, mask = hydride.add_hydrogen(molecule) coord, energies = hydride.relax_hydrogen( molecule_with_h, return_trajectory=True, return_energies=True ) print(coord.shape) print(energies.shape) fig, ax = plt.subplots(figsize=(4.0, 2.0)) ax.plot(energies) ax.set_xlabel("Iteration") ax.set_ylabel("Energy") fig.tight_layout() .. image:: /images/api_04.png :align: center Handling missing formal charges ------------------------------- Both :func:`add_hydrogen()` and :func:`relax_hydrogen()` require associated formal charges for correct fragment identification or the electrostatic potential, respectively. However, for many entries in the *RCSB PDB* they are not properly set. At least for amino acids this issue can be remedied with :func:`estimate_amino_acid_charges()`. This function calculates the formal charges of atoms in amino acids based on a given pH value. .. code-block:: python charges = hydride.estimate_amino_acid_charges(molecule, ph=7.0) molecule.set_annotation("charge", charges) Handling missing bonds ---------------------- The input :class:`AtomArray` is also required to have an associated :class:`BondList`. Unfortunately, this information cannot be retrieved from *PDB* or *PDBX/mmCIF* files. If the residues in the input structure are part of the *Chemical Component Dictionary*, the :class:`BondList` can be created with :func:`biotite.structure.connect_via_residue_names()`. .. code-block:: python import biotite.structure as struc molecule.bonds = struc.connect_via_residue_names(molecule) | Classes and functions --------------------- .. autoclass:: FragmentLibrary :members: :undoc-members: :inherited-members: | .. autoclass:: AtomNameLibrary :members: :undoc-members: :inherited-members: | .. autofunction:: add_hydrogen | .. autofunction:: relax_hydrogen | .. autofunction:: estimate_amino_acid_charges