Commit ca78cde2 authored by Jacob Durrant's avatar Jacob Durrant

Updating to 1.2.4.

parent a8839505
*.pyc
NOTES.txt
.DS_Store
.ipynb_checkpoints
Changes
=======
WIP
---
1.2.4
-----
site_substructures.smarts can now incldue comments (lines that start with # ignored.)
Black formatting
* Dimorphite-DL now better protonates compounds with polyphosphate chains
(e.g., ATP). See `site_substructures.smarts` for the rationale behind the
added pKa values.
* Added test cases for ATP and NAD.
* `site_substructures.smarts` now allows comments (lines that start with `#`).
* Fixed a bug that affected how Dimorphite-DL deals with new protonation
states that yield invalid SMILES strings.
* Previously, it simply returned the original input SMILES in these rare
cases (better than nothing). Now, it instead returns the last valid SMILES
produced, not necessarily the original SMILES.
* Consider `O=C(O)N1C=CC=C1` at pH 3.5 as an example.
* Dimorphite-DL first deprotonates the carboxyl group, producing
`O=C([O-])n1cccc1` (a valid SMILES).
* It then attempts to protonate the aromatic nitrogen, producing
`O=C([O-])[n+]1cccc1`, an invalid SMILES.
* Previously, it would output the original SMILES, `O=C(O)N1C=CC=C1`. Now
it outputs the last valid SMILES, `O=C([O-])n1cccc1`.
* Improved suport for the `--silent` option.
* Reformatted code per the [*Black* Python code
formatter](https://github.com/psf/black).
1.2.3
-----
......
Dimorphite-DL 1.2.3
Dimorphite-DL 1.2.4
===================
What is it?
......@@ -34,7 +34,7 @@ usage: dimorphite_dl.py [-h] [--min_ph MIN] [--max_ph MAX]
[--smiles_file FILE] [--output_file FILE]
[--label_states] [--test]
Dimorphite 1.2.3: Creates models of appropriately protonated small moleucles.
Dimorphite 1.2.4: Creates models of appropriately protonated small moleucles.
Apache 2.0 License. Copyright 2020 Jacob D. Durrant.
optional arguments:
......
This diff is collapsed.
# Polyphosphates present a particularly difficult case for several reasons:
# 1) Dimorphite-DL does not allow distinct moieties to overlap, but in a
# polyphosphate chain the bridging oxygen atoms must be considered in
# each repeat to match correctly.
# 2) Dimorphite-DL currently accommodates up to two protonation states per
# moiety, but polyphosphates can have many more.
# Recursive, only first atom matches.
# chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://link.springer.com/content/pdf/10.1007%2FBF00974032.pdf
# https://github.com/rdkit/rdkit/issues/1404
# HELPFUL! https://smartsview.zbh.uni-hamburg.de/
*Azide [N+0:1]=[N+:2]=[N+0:3]-[H] 2 4.65 0.07071067811865513
Nitro [C,c,N,n,O,o:1]-[NX3:2](=[O:3])-[O:4]-[H] 3 -1000.0 0
AmidineGuanidine1 [N:1]-[C:2](-[N:3])=[NX2:4]-[H:5] 3 12.025333333333334 1.5941046150769165
AmidineGuanidine2 [C:1](-[N:2])=[NX2+0:3] 2 10.035538461538462 2.1312826469414716
Sulfate [SX4:1](=[O:2])(=[O:3])([O:4]-[C,c,N,n:5])-[OX2:6]-[H] 5 -2.36 1.3048043093561141
Sulfonate [SX4:1](=[O:2])(=[O:3])(-[C,c,N,n:4])-[OX2:5]-[H] 4 -1.8184615384615386 1.4086213481855594
Sulfinic_acid [SX3:1](=[O:2])-[O:3]-[H] 2 1.7933333333333332 0.4372070447739835
Phenyl_carboxyl [c,n,o:1]-[C:2](=[O:3])-[O:4]-[H] 3 3.463441968255319 1.2518054407928614
Carboxyl [C:1](=[O:2])-[O:3]-[H] 2 3.456652971502591 1.2871420886834017
Thioic_acid [C,c,N,n:1](=[O,S:2])-[SX2,OX2:3]-[H] 2 0.678267 1.497048763660801
Phenyl_Thiol [c,n:1]-[SX2:2]-[H] 1 4.978235294117647 2.6137000480499806
Thiol [C,N:1]-[SX2:2]-[H] 1 9.12448275862069 1.3317968158171463
# [*]OP(=O)(O[H])O[H]. Note that this matches terminal phosphate of ATP, ADP, AMP.
Phosphate [PX4:1](=[O:2])(-[OX2:3]-[H])(-[O+0:4])-[OX2:5]-[H] 2 2.4182608695652172 1.1091177991945305 5 6.5055 0.9512787792174668
# Note that Internal_phosphate_polyphos_chain and
......@@ -21,11 +22,48 @@ Phosphate [PX4:1](=[O:2])(-[OX2:3]-[H])(-[O+0:4])-[OX2:5]-[H] 2 2.41826086956521
# For Internal_phosphate_polyphos_chain, we use a mean pKa value of 0.9, per
# DOI: 10.7554/eLife.38821. For the precision value we use 1.0, which is roughly
# the precision of the two ionizable hydroxyls from Phosphate (see above).
Internal_phosphate_polyphos_chain [$([PX4:1](=O)([OX2][PX4](=O)([OX2])(O[H]))([OX2][PX4](=O)(O[H])([OX2])))][O:2]-[H] 1 2.4182608695652172 1.1091177991945305
# the precision of the two ionizable hydroxyls from Phosphate (see above). Note
# that when using recursive SMARTS strings, RDKit considers only the first atom
# to be a match. Subsequent atoms define the environment.
Internal_phosphate_polyphos_chain [$([PX4:1](=O)([OX2][PX4](=O)([OX2])(O[H]))([OX2][PX4](=O)(O[H])([OX2])))][O:2]-[H] 1 0.9 1.0
# For Initial_phosphate_like_in_ATP_ADP, we use the same values found for the
# lower-pKa hydroxul of Phosphate (above).
# lower-pKa hydroxyl of Phosphate (above).
Initial_phosphate_like_in_ATP_ADP [$([PX4:1]([OX2][C,c,N,n])(=O)([OX2][PX4](=O)([OX2])(O[H])))]O-[H] 1 2.4182608695652172 1.1091177991945305
# [*]P(=O)(O[H])O[H]. Cannot match terminal phosphate of ATP because O not among [C,c,N,n]
Phosphonate [PX4:1](=[O:2])(-[OX2:3]-[H])(-[C,c,N,n:4])-[OX2:5]-[H] 2 1.8835714285714287 0.5925999820080644 5 7.247254901960784 0.8511476450801531
Phenol [c,n,o:1]-[O:2]-[H] 1 7.065359866910526 3.277356122295936
Peroxide1 [O:1]([$(C=O),$(C[Cl]),$(CF),$(C[Br]),$(CC#N):2])-[O:3]-[H] 2 8.738888888888889 0.7562592839596507
Peroxide2 [C:1]-[O:2]-[O:3]-[H] 2 11.978235294117647 0.8697645895163075
O=C-C=C-OH [O:1]=[C;R:2]-[C;R:3]=[C;R:4]-[O:5]-[H] 4 3.554 0.803339458581667
Vinyl_alcohol [C:1]=[C:2]-[O:3]-[H] 2 8.871850714285713 1.660200255394124
Alcohol [C:1]-[O:2]-[H] 1 14.780384615384616 2.546464970533435
N-hydroxyamide [C:1](=[O:2])-[N:3]-[O:4]-[H] 3 9.301904761904762 1.2181897185891002
*Ringed_imide1 [O,S:1]=[C;R:2]([$([#8]),$([#7]),$([#16]),$([#6][Cl]),$([#6]F),$([#6][Br]):3])-[N;R:4]([C;R:5]=[O,S:6])-[H] 3 6.4525 0.5555627777308341
*Ringed_imide2 [O,S:1]=[C;R:2]-[N;R:3]([C;R:4]=[O,S:5])-[H] 2 8.681666666666667 1.8657779975741713
*Imide [F,Cl,Br,S,s,P,p:1][#6:2][CX3:3](=[O,S:4])-[NX3+0:5]([CX3:6]=[O,S:7])-[H] 4 2.466666666666667 1.4843629385474877
*Imide2 [O,S:1]=[CX3:2]-[NX3+0:3]([CX3:4]=[O,S:5])-[H] 2 10.23 1.1198214143335534
*Amide_electronegative [C:1](=[O:2])-[N:3](-[Br,Cl,I,F,S,O,N,P:4])-[H] 2 3.4896 2.688124315081677
*Amide [C:1](=[O:2])-[N:3]-[H] 2 12.00611111111111 4.512491341218857
*Sulfonamide [SX4:1](=[O:2])(=[O:3])-[NX3+0:4]-[H] 3 7.9160326086956525 1.9842121316708763
Anilines_primary [c:1]-[NX3+0:2]([H:3])[H:4] 1 3.899298673194805 2.068768503987161
Anilines_secondary [c:1]-[NX3+0:2]([H:3])[!H:4] 1 4.335408163265306 2.1768842022330843
Anilines_tertiary [c:1]-[NX3+0:2]([!H:3])[!H:4] 1 4.16690685045614 2.005865735782679
Aromatic_nitrogen_unprotonated [n+0&H0:1] 0 4.3535441240733945 2.0714072661859584
Amines_primary_secondary_tertiary [C:1]-[NX3+0:2] 1 8.159107682388349 2.5183597445318147
# e.g., [*]P(=O)(O[H])[*]. Note that cannot match the internal phosphates of ATP, because
# oxygen is not among [C,c,N,n,F,Cl,Br,I]
Phosphinic_acid [PX4:1](=[O:2])(-[C,c,N,n,F,Cl,Br,I:3])(-[C,c,N,n,F,Cl,Br,I:4])-[OX2:5]-[H] 4 2.9745 0.6867886750744557
# e.g., [*]OP(=O)(O[H])O[*]. Cannot match ATP because P not among [C,c,N,n,F,Cl,Br,I]
Phosphate_diester [PX4:1](=[O:2])(-[OX2:3]-[C,c,N,n,F,Cl,Br,I:4])(-[O+0:5]-[C,c,N,n,F,Cl,Br,I:4])-[OX2:6]-[H] 6 2.7280434782608696 2.5437448856908316
# e.g., [*]P(=O)(O[H])O[*]. Cannot match ATP because O not among [C,c,N,n,F,Cl,Br,I].
Phosphonate_ester [PX4:1](=[O:2])(-[OX2:3]-[C,c,N,n,F,Cl,Br,I:4])(-[C,c,N,n,F,Cl,Br,I:5])-[OX2:6]-[H] 5 2.0868 0.4503028610465036
Primary_hydroxyl_amine [C,c:1]-[O:2]-[NH2:3] 2 4.035714285714286 0.8463816543155368
*Indole_pyrrole [c;R:1]1[c;R:2][c;R:3][c;R:4][n;R:5]1[H] 4 14.52875 4.06702491591416
*Aromatic_nitrogen_protonated [n:1]-[H] 0 7.17 2.94602395490212
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment