Commit a7f7b52f authored by jdurrant's avatar jdurrant
Browse files

Minor updates (version nuber, citation, etc.)

parent 06353987
......@@ -10,3 +10,4 @@ data/**
.store/
dist/**
build/**
.vscode
Changes
=======
1.0.2
-----
* Added a CLI implementation of the program. See `README.md` for details.
* Added a version number and citation to the program output.
1.0.1
-----
......
# DeepFrag
DeepFrag is a machine learning model for fragment-based lead optimization. In this repository, you will find code to train the model and code to run inference using a pre-trained model.
DeepFrag is a machine learning model for fragment-based lead optimization. In
this repository, you will find code to train the model and code to run
inference using a pre-trained model.
# Citation
## Citation
If you use DeepFrag in your research, please cite as:
Green, H., Koes, D. R., & Durrant, J. D. (2021). DeepFrag: a deep convolutional neural network for fragment-based lead optimization. Chemical Science.
Green, H., Koes, D. R., & Durrant, J. D. (2021). DeepFrag: a deep
convolutional neural network for fragment-based lead optimization. Chemical
Science.
```tex
@article{green2021deepfrag,
......@@ -19,68 +22,106 @@ Green, H., Koes, D. R., & Durrant, J. D. (2021). DeepFrag: a deep convolutional
}
```
# Usage
## Usage
There are three ways to use DeepFrag:
1. **DeepFrag Browser App**: We have released a free, open-source browser app for DeepFrag that requires no setup and does not transmit any structures to a remote server.
- View the online version at [durrantlab.pitt.edu/deepfrag](https://durrantlab.pitt.edu/deepfrag/)
- See the code at [git.durrantlab.pitt.edu/jdurrant/deepfrag-app](https://git.durrantlab.pitt.edu/jdurrant/deepfrag-app)
2. **DeepFrag CLI**: In this repository we have included a `deepfrag.py` script that can perform common prediction tasks using the API.
1. **DeepFrag Browser App**: We have released a free, open-source browser app
for DeepFrag that requires no setup and does not transmit any structures to
a remote server.
- View the online version at
[durrantlab.pitt.edu/deepfrag](https://durrantlab.pitt.edu/deepfrag/)
- See the code at
[git.durrantlab.pitt.edu/jdurrant/deepfrag-app](https://git.durrantlab.pitt.edu/jdurrant/deepfrag-app)
2. **DeepFrag CLI**: In this repository we have included a `deepfrag.py`
script that can perform common prediction tasks using the API.
- See the `DeepFrag CLI` section below
3. **DeepFrag API**: For custom tasks or fine-grained control over predictions, you can invoke the DeepFrag API directly and interface with the raw data structures and the PyTorch model. We have created an example Google Colab (Jupyter notebook) that demonstrates how to perform manual predictions.
- See the interactive [Colab](https://colab.research.google.com/drive/1XWin26iDXqZ2ioGtwDRuO4iRomGVpdte)
3. **DeepFrag API**: For custom tasks or fine-grained control over
predictions, you can invoke the DeepFrag API directly and interface with
the raw data structures and the PyTorch model. We have created an example
Google Colab (Jupyter notebook) that demonstrates how to perform manual
predictions.
- See the interactive
[Colab](https://colab.research.google.com/drive/1XWin26iDXqZ2ioGtwDRuO4iRomGVpdte).
## DeepFrag CLI
# DeepFrag CLI
The DeepFrag CLI is invoked by running `python3 deepfrag.py` in this
repository. The CLI requires a pre-trained model and the fragment library to
run. You will be prompted to download both when you first run the CLI and
these will be saved in the `./.store` directory.
The DeepFrag CLI is invoked by running `python3 deepfrag.py` in this repository. The CLI requires a pre-trained model and the fragment library to run. You will be prompted to download both when you first run the CLI and these will be saved in the `./.store` directory.
### Structure (specify exactly one)
The input structures are specified using either a manual receptor and ligand
pdb or by specifying a pdb id and the ligand residue number.
## Structure (specify exactly one)
The input structures are specified using either a manual receptor and ligand pdb or by specifying a pdb id and the ligand residue number.
- `--receptor <rec.pdb> --ligand <lig.pdb>`
- `--pdb <pdbid> --resnum <resnum>`
## Connection Point (specify exactly one)
### Connection Point (specify exactly one)
DeepFrag will predict new fragments that connect to the _connection point_ via a single bond. You must specify the connection point atom using one of the following:
- `--cname <name>`: Specify the connection point by atom name (e.g. `C3`, `N5`, `O2`, ...).
- `--cx <x> --cy <y> --cz <z>`: Specify the connection point by atomic coordinate. DeepFrag will find the closest atom to this point.
DeepFrag will predict new fragments that connect to the _connection point_ via
a single bond. You must specify the connection point atom using one of the
following:
## Fragment Removal (optional) (specify exactly one)
- `--cname <name>`: Specify the connection point by atom name (e.g. `C3`,
`N5`, `O2`, ...).
- `--cx <x> --cy <y> --cz <z>`: Specify the connection point by atomic
coordinate. DeepFrag will find the closest atom to this point.
If you are using DeepFrag for fragment _replacement_, you must first remove the original fragment from the ligand structure. You can either do this by hand, e.g. editing the PDB, or DeepFrag can do this for you by specifying _which_ fragment should be removed.
### Fragment Removal (optional) (specify exactly one)
_Note: predicting fragments in place of hydrogen atoms (e.g. protons) does not require any fragment removal since hydrogen atoms are ignored by the model._
If you are using DeepFrag for fragment _replacement_, you must first remove
the original fragment from the ligand structure. You can either do this by
hand, e.g. editing the PDB, or DeepFrag can do this for you by specifying
_which_ fragment should be removed.
To remove a fragment, you specify a second atom that is contained in the fragment. Like the connection point, you can either use the atom name or the atom coordinate.
_Note: predicting fragments in place of hydrogen atoms (e.g. protons) does not
require any fragment removal since hydrogen atoms are ignored by the model._
- `--rname <name>`: Specify the connection point by atom name (e.g. `C3`, `N5`, `O2`, ...).
- `--rx <x> --ry <y> --rz <z>`: Specify the connection point by atomic coordinate. DeepFrag will find the closest atom to this point.
To remove a fragment, you specify a second atom that is contained in the
fragment. Like the connection point, you can either use the atom name or the
atom coordinate.
- `--rname <name>`: Specify the connection point by atom name (e.g. `C3`,
`N5`, `O2`, ...).
- `--rx <x> --ry <y> --rz <z>`: Specify the connection point by atomic
coordinate. DeepFrag will find the closest atom to this point.
## Output (optional)
### Output (optional)
By default, DeepFrag will print a list of fragment predictions to stdout similar to the [Browser App](https://durrantlab.pitt.edu/deepfrag/).
By default, DeepFrag will print a list of fragment predictions to stdout
similar to the [Browser App](https://durrantlab.pitt.edu/deepfrag/).
- `--out <out.csv>`: Save predictions in CSV format to `out.csv`. Each line contains the fragment rank, score and SMILES string.
- `--out <out.csv>`: Save predictions in CSV format to `out.csv`. Each line
contains the fragment rank, score and SMILES string.
## Miscellaneous (optional)
### Miscellaneous (optional)
- `--full`: Generate SMILES strings with the full ligand structure instead of just the fragment.
- `--cpu/--gpu`: DeepFrag will attempt to infer if a Cuda GPU is available and fallback to the CPU if it is not. You can set either the `--cpu` or `--gpu` flag to explicitly specify the target device.
- `--num_grids <num>`: Number of grid rotations to use. Using more will take longer but produce a more stable prediction. (Default: 4)
- `--top_k <k>`: Number of predictions to print in stdout. Use -1 to display all. (Default: 25)
- `--full`: Generate SMILES strings with the full ligand structure instead of
just the fragment.
- `--cpu/--gpu`: DeepFrag will attempt to infer if a Cuda GPU is available and
fallback to the CPU if it is not. You can set either the `--cpu` or `--gpu`
flag to explicitly specify the target device.
- `--num_grids <num>`: Number of grid rotations to use. Using more will take
longer but produce a more stable prediction. (Default: 4)
- `--top_k <k>`: Number of predictions to print in stdout. Use -1 to display
all. (Default: 25)
# Reproduce Results
## Reproduce Results
You can use the DeepFrag CLI to reproduce the highlighted results from the main manuscript:
You can use the DeepFrag CLI to reproduce the highlighted results from the
main manuscript:
## 1. Fragment replacement
### 1. Fragment replacement
To replace fragments, specify the connection point (`cname` or `cx/cy/cz`) and specify a second atom that is contained in the fragment (`rname` or `rx/ry/rz`).
To replace fragments, specify the connection point (`cname` or `cx/cy/cz`) and
specify a second atom that is contained in the fragment (`rname` or
`rx/ry/rz`).
```bash
# Fig. 3: (2XP9) H. sapiens peptidyl-prolyl cistrans isomerase NIMA-interacting 1 (HsPin1p)
# Fig. 3: (2XP9) H. sapiens peptidyl-prolyl cis-trans isomerase NIMA-interacting 1 (HsPin1p)
# Carboxylate A
$ python3 deepfrag.py --pdb 2xp9 --resnum 1165 --cname C10 --rname C12
......@@ -125,9 +166,11 @@ $ python3 deepfrag.py --pdb 1x38 --resnum 1001 --cname C7B --rname C1
$ python3 deepfrag.py --pdb 4fow --resnum 701 --cname CAE --rname NAA
```
## 2. Fragment addition
### 2. Fragment addition
For fragment addition, you only need to specify the atom connection point (`cname` or `cx/cy/cz`). In this case, DeepFrag will implicily replace a valent hydrogen.
For fragment addition, you only need to specify the atom connection point
(`cname` or `cx/cy/cz`). In this case, DeepFrag will implicitly replace a
valent hydrogen.
```bash
# Fig. 5: Ligands targeting the SARS-CoV-2 main protease (MPro)
......@@ -139,22 +182,25 @@ $ python3 deepfrag.py --pdb 5rgh --resnum 404 --cname C09
$ python3 deepfrag.py --pdb 5r81 --resnum 1001 --cname C07
```
# Overview
## Overview
- `config`: fixed configuration information (eg. TRAIN/VAL/TEST partitions)
- `configurations`: benchmark model configurations (see [`configurations/README.md`](configurations/README.md))
- `config`: fixed configuration information (e.g., TRAIN/VAL/TEST partitions)
- `configurations`: benchmark model configurations (see
[`configurations/README.md`](configurations/README.md))
- `data`: training/inference data (see [`data/README.md`](data/README.md))
- `leadopt`: main module code
- `models`: pytorch architecture definitions
- `data_util.py`: utility code for reading packed fragment/fingerprint data files
- `grid_util.py`: GPU-accelerated grid generation code
- `metrics.py`: pytorch implementations of several metrics
- `model_conf.py`: contains code to configure and train models
- `util.py`: utility code for rdkit/openbabel processing
- `scripts`: data processing scripts (see [`scripts/README.md`](scripts/README.md))
- `models`: pytorch architecture definitions
- `data_util.py`: utility code for reading packed fragment/fingerprint data
files
- `grid_util.py`: GPU-accelerated grid generation code
- `metrics.py`: pytorch implementations of several metrics
- `model_conf.py`: contains code to configure and train models
- `util.py`: utility code for rdkit/openbabel processing
- `scripts`: data processing scripts (see
[`scripts/README.md`](scripts/README.md))
- `train.py`: CLI interface to launch training runs
# Dependencies
## Dependencies
You can build a virtualenv with the requirements:
......@@ -166,9 +212,11 @@ $ pip install -r requirements.txt
Note: `Cuda 10.1` is required during training
# Training
## Training
To train a model, you can use the `train.py` utility script. You can specify model parameters as command line arguments or load parameters from a configuration args.json file.
To train a model, you can use the `train.py` utility script. You can specify
model parameters as command line arguments or load parameters from a
configuration args.json file.
```bash
python train.py \
......@@ -189,15 +237,21 @@ python train.py \
--configuration=./configurations/args.json
```
`save_path` is a directory to save the best model. The directory will be created if it doesn't exist. If this is not provided, the model will not be saved.
`save_path` is a directory to save the best model. The directory will be
created if it doesn't exist. If this is not provided, the model will not be
saved.
`wandb_project` is an optional wandb project name. If provided, the run will be logged to wandb.
`wandb_project` is an optional wandb project name. If provided, the run will
be logged to wandb.
See below for available models and model-specific parameters:
# Leadopt Models
## Leadopt Models
In this repository, trainable models are subclasses of `model_conf.LeadoptModel`. This class encapsulates model configuration arguments and pytorch models and enables saving and loading multi-component models.
In this repository, trainable models are subclasses of
`model_conf.LeadoptModel`. This class encapsulates model configuration
arguments and pytorch models and enables saving and loading multi-component
models.
```py
from leadopt.model_conf import LeadoptModel, MODELS
......@@ -210,11 +264,12 @@ model.train(save_path='./mymodel')
model2 = LeadoptModel.load('./mymodel')
```
Internally, model arguments are configured by setting up an `argparse` parser and passing around a `dict` of configuration parameters in `self._args`.
Internally, model arguments are configured by setting up an `argparse` parser
and passing around a `dict` of configuration parameters in `self._args`.
## VoxelNet
### VoxelNet
```
```text
--no_partitions If set, disable the use of TRAIN/VAL partitions during
training.
-f FRAGMENTS, --fragments FRAGMENTS
......
# Data for Training and Inference
This folder contains data used during training and inference.
Model configuration files in `/configurations` expect the data files to be in this directory. You can either copy them directly here or use symlinks.
Model configuration files in `/configurations` expect the data files to be in
this directory. You can either copy them directly here or use symlinks.
You can download the data here: http://durrantlab.com/apps/deepfrag/files/
<!-- https://pitt.box.com/s/ubohnl10idnarpam40hq6chggtaojqv7 -->
Overview:
- `moad.h5` (7 GB): processed MOAD data loaded by `data_util.FragmentDataset`
- `rdk10_moad` (384 MB): RDK-10 fingerprints for MOAD data loaded by `data_util.FingerprintDataset` (generated with `scripts/make_fingerprints.py`)
- `rdk10_moad` (384 MB): RDK-10 fingerprints for MOAD data loaded by
`data_util.FingerprintDataset` (generated with
`scripts/make_fingerprints.py`)
import argparse
import functools
import os
......@@ -28,6 +27,7 @@ FINGERPRINTS_DOWNLOAD = 'https://durrantlab.pitt.edu/apps/deepfrag/files/fingerp
RCSB_DOWNLOAD = 'https://files.rcsb.org/download/%s.pdb1'
VERSION = "1.0.2"
def download_remote(url, path, compression=None):
r = requests.get(url, stream=True, allow_redirects=True)
......@@ -70,16 +70,16 @@ def ensure_cli_data():
if r.lower() == 'n':
print('Exiting...')
exit(-1)
print(f'Saving to {model_path}...')
download_remote(MODEL_DOWNLOAD, model_path, compression='zip')
if not os.path.exists(str(fingerprints_path)):
r = input('Fingerprint library not found, download it now? (11 MB) [Y/n]: ')
if r.lower() == 'n':
print('Exiting...')
exit(-1)
print(f'Saving to {fingerprints_path}...')
download_remote(FINGERPRINTS_DOWNLOAD, fingerprints_path, compression=None)
......@@ -161,7 +161,7 @@ def preprocess_ligand(lig, conn, rvec):
def lookup_atom_name(lig_path, name):
"""Try to look up an atom by name. Returns the coordinate of the atom if
"""Try to look up an atom by name. Returns the coordinate of the atom if
found."""
p = prody.parsePDB(lig_path)
p = p.select('name %s' % name)
......@@ -254,7 +254,7 @@ def get_target_device(args) -> str:
def generate_grids(args, model_args, rec_coords, rec_types, parent_coords, parent_types, conn, device):
start = time.time()
print('[*] Generating grids ... ', end='', flush=True)
batch = grid_util.get_raw_batch(
rec_coords, rec_types, parent_coords, parent_types,
......@@ -373,6 +373,14 @@ def run(args):
def main():
global VERSION
print("\nDeepFrag " + VERSION)
print("\nIf you use DeepFrag in your research, please cite:\n")
print("Green, H., Koes, D. R., & Durrant, J. D. (2021). DeepFrag: a deep convolutional")
print("neural network for fragment-based lead optimization. Chemical Science.\n")
ensure_cli_data()
parser = argparse.ArgumentParser()
......@@ -398,11 +406,11 @@ def main():
# Misc
parser.add_argument('--full', action='store_true', default=False,
help='Print the full (fused) ligand structure.')
parser.add_argument('--num_grids', type=int, default=4,
parser.add_argument('--num_grids', type=int, default=4,
help='Number of grid rotations.')
parser.add_argument('--top_k', type=int, default=25,
parser.add_argument('--top_k', type=int, default=25,
help='Number of results to show. Set to -1 to show all.')
parser.add_argument('--out', type=str,
parser.add_argument('--out', type=str,
help='Path to output CSV file.')
parser.add_argument('--cpu', action='store_true', default=False,
help='Use the CPU for grid generation and predictions.')
......@@ -417,7 +425,7 @@ def main():
([('rx', 'ry', 'rz'), ('rname',)], False),
([('cpu',), ('gpu',)], False)
]
for grp, req in groupings:
partial = []
complete = 0
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment