README.md 3.86 KB
Newer Older
Harrison Green's avatar
init  
Harrison Green committed
1

hgarrereyn's avatar
cleanup    
hgarrereyn committed
2
# DeepFrag
Harrison Green's avatar
init  
Harrison Green committed
3

hgarrereyn's avatar
hgarrereyn committed
4
This repository contains code for machine learning based lead optimization.
Harrison Green's avatar
init  
Harrison Green committed
5

hgarrereyn's avatar
hgarrereyn committed
6
7
8
# Overview

- `config`: fixed configuration information (eg. TRAIN/VAL/TEST partitions)
hgarrereyn's avatar
cleanup    
hgarrereyn committed
9
- `configurations`: benchmark model configurations (see [`configurations/README.md`](configurations/README.md))
Harrison Green's avatar
init  
Harrison Green committed
10
11
- `data`: training/inference data (see [`data/README.md`](data/README.md))
- `leadopt`: main module code
hgarrereyn's avatar
hgarrereyn committed
12
13
    - `models`: pytorch architecture definitions
    - `data_util.py`: utility code for reading packed fragment/fingerprint data files
Harrison Green's avatar
init  
Harrison Green committed
14
    - `grid_util.py`: GPU-accelerated grid generation code
hgarrereyn's avatar
hgarrereyn committed
15
16
17
    - `metrics.py`: pytorch implementations of several metrics
    - `model_conf.py`: contains code to configure and train models
    - `util.py`: utility code for rdkit/openbabel processing
hgarrereyn's avatar
cleanup    
hgarrereyn committed
18
- `scripts`: data processing scripts (see [`scripts/README.md`](scripts/README.md))
Harrison Green's avatar
init  
Harrison Green committed
19
- `train.py`: CLI interface to launch training runs
hgarrereyn's avatar
hgarrereyn committed
20
21
22
23
24
25
26
27
28
29
30
31

# Dependencies

You can build a virtualenv with the requirements:

```sh
$ python3 -m venv leadopt_env
$ source ./leadopt_env/bin/activate
$ pip install -r requirements.txt
```

Note: `Cuda 10.1` is required during training
Harrison Green's avatar
init  
Harrison Green committed
32
33
34

# Training

hgarrereyn's avatar
hgarrereyn committed
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
To train a model, you can use the `train.py` utility script. You can specify model parameters as command line arguments or load parameters from a configuration args.json file.

```bash
python train.py \
    --save_path=/path/to/model \
    --wandb_project=my_project \
    {model_type} \
    --model_arg1=x \
    --model_arg2=y \
    ...
```

or

```bash
python train.py \
    --save_path=/path/to/model \
    --wandb_project=my_project \
    --configuration=./configurations/args.json
```

`save_path` is a directory to save the best model. The directory will be created if it doesn't exist. If this is not provided, the model will not be saved.

`wandb_project` is an optional wandb project name. If provided, the run will be logged to wandb.

See below for available models and model-specific parameters:

# Leadopt Models

In this repository, trainable models are subclasses of `model_conf.LeadoptModel`. This class encapsulates model configuration arguments and pytorch models and enables saving and loading multi-component models.

```py
from leadopt.model_conf import LeadoptModel, MODELS

model = MODELS['voxel']({args...})
model.train(save_path='./mymodel')

...

model2 = LeadoptModel.load('./mymodel')
```

Internally, model arguments are configured by setting up an `argparse` parser and passing around a `dict` of configuration parameters in `self._args`.

## VoxelNet

```
--no_partitions     If set, disable the use of TRAIN/VAL partitions during
                    training.
-f FRAGMENTS, --fragments FRAGMENTS
                    Path to fragments file.
-fp FINGERPRINTS, --fingerprints FINGERPRINTS
                    Path to fingerprints file.
-lr LEARNING_RATE, --learning_rate LEARNING_RATE
--num_epochs NUM_EPOCHS
                    Number of epochs to train for.
--test_steps TEST_STEPS
                    Number of evaluation steps per epoch.
-b BATCH_SIZE, --batch_size BATCH_SIZE
--grid_width GRID_WIDTH
--grid_res GRID_RES
--fdist_min FDIST_MIN
                    Ignore fragments closer to the receptor than this
                    distance (Angstroms).
--fdist_max FDIST_MAX
                    Ignore fragments further from the receptor than this
                    distance (Angstroms).
--fmass_min FMASS_MIN
                    Ignore fragments smaller than this mass (Daltons).
--fmass_max FMASS_MAX
                    Ignore fragments larger than this mass (Daltons).
--ignore_receptor
--ignore_parent
-rec_typer {single,single_h,simple,simple_h,desc,desc_h}
-lig_typer {single,single_h,simple,simple_h,desc,desc_h}
-rec_channels REC_CHANNELS
-lig_channels LIG_CHANNELS
--in_channels IN_CHANNELS
--output_size OUTPUT_SIZE
--pad
--blocks BLOCKS [BLOCKS ...]
--fc FC [FC ...]
--use_all_labels
--dist_fn {mse,bce,cos,tanimoto}
--loss {direct,support_v1}
```