Database Schema¶
This page describes the internal SQLite database structure used by EasyDock. It is intended for developers extending the pipeline or reading results programmatically.
Overview¶
EasyDock stores all molecule data, docking results, and session configuration in a single SQLite file (the -o / --output argument). The database is opened in WAL (Write-Ahead Logging) mode to allow concurrent reads from result-retrieval tools while docking is in progress.
The schema consists of six tables:
| Table | Purpose |
|---|---|
mols |
Molecule structures and docking results — the main data table |
setup |
Session configuration for resuming interrupted runs |
variables |
Module-level metadata and run-time flags |
plif_names |
Lookup table for protein-ligand interaction contact types |
plif_res |
Per-molecule, per-pose interaction fingerprint results |
bust |
PoseBusters validation results per pose |
mols¶
The central table. One row per enumerated stereoisomer of each input molecule. The primary key is (id, stereo_id).
| Column | Type | Default | Description |
|---|---|---|---|
id |
TEXT | — | Molecule identifier — taken from the input file title or derived from SMILES. |
stereo_id |
INTEGER | 0 |
Stereoisomer index. 0 for the first (or only)isomer; incremented for each additional one. |
smi_input |
TEXT | NULL | Raw SMILES exactly as read from the input file. Only set for SMILES inputs; NULL for SDF/PDBQT. |
smi |
TEXT | NULL | Canonical SMILES after salt stripping andvz stereoisomer assignment. Used as the docking input for 2D structures. |
smi_protonated |
TEXT | NULL | Protonated canonical SMILES at the target pH. Set only when --protonation is used. |
source_mol_block_input |
TEXT | NULL | Original mol block from the input SDF/PDBQT file. Set only for 3D structure inputs. |
source_mol_block |
TEXT | NULL | Cleaned mol block (salts stripped, 2D/3D structure) prepared for docking. Set for 3D inputs. |
source_mol_block_protonated |
TEXT | NULL | Mol block after protonation, inheriting 3D coordinates from source_mol_block.Set when protonation is applied to a 3D input. |
docking_score |
REAL | NULL | Best docking score in kcal/mol. NULL until docking completes for this molecule. |
raw_block |
TEXT | NULL | All docking poses in their native format (PDBQT or SDF, controlled by the raw_formatvariable). NULL until docking completes. |
mol_block |
TEXT | NULL | Best docking pose as an MDL mol block. NULL until docking completes. |
dock_time |
REAL | NULL | Wall-clock docking time for this molecule in seconds. |
time |
TEXT | NULL | ISO timestamp of the last write to this row ( datetime(current_timestamp, 'localtime')).Updated on every UPDATE. |
Primary Key¶
PRIMARY KEY (id, stereo_id)
Processing Stages¶
Each stage selects its work by checking which columns are still NULL:
| Stage | Selects rows where | Writes columns |
|---|---|---|
| Input parsing | (new rows) | id, stereo_id, smi_input, smi,source_mol_block_input,source_mol_block |
| Protonation | smi IS NOT NULLAND smi_protonated IS NULLAND docking_score IS NULL |
smi_protonated,source_mol_block_protonated |
| Docking (with protonation) | docking_score IS NULLAND (smi_protonated IS NOT NULLOR source_mol_block_protonated IS NOT NULL) |
docking_score, raw_block,mol_block, dock_time |
| Docking (without protonation) | docking_score IS NULLAND (smi IS NOT NULLOR source_mol_block IS NOT NULL) |
docking_score, raw_block,mol_block, dock_time |
Notes¶
- Rows with
smi IS NULL AND source_mol_block IS NULL AND stereo_id = 0represent molecules that failed stereoisomer generation and are silently skipped by all downstream stages. mol_blockstores only the best pose; all poses are inraw_block. Useraw_blockwhen re-ranking or re-scoring poses.- The
raw_blockformat (PDBQT vs SDF) is determined by theraw_formatvariable in thevariablestable.
setup¶
Stores the full session configuration so interrupted runs can be resumed with --input and --output only. One row per key.
| Column | Type | Description |
|---|---|---|
key |
TEXT (PRIMARY KEY) | Configuration key (see below). |
content |
TEXT | Configuration value — raw text, YAML, or JSON depending on the key. |
Keys¶
| Key | Content |
|---|---|
args |
JSON-serialised argparse namespace — all CLI arguments as passed to the original run. |
config |
Raw YAML text of the docking config file. |
file:<path> |
Content of a text file referenced inside the config YAML. <path> is the dotted YAML key(e.g., file:protein, file:init_server.protein).Stored so the session can be fully reconstructed without the original files. |
variables¶
A generic key-value store for module-level metadata. Allows any module to persist flags or settings without a schema change. Composite primary key (module, name).
| Column | Type | Description |
|---|---|---|
module |
TEXT (NOT NULL) | Module that owns this variable (e.g., run_dock, database, easydock_bust). |
name |
TEXT (NOT NULL) | Variable name. |
value_type |
TEXT | Type hint for deserialisation: 'int', 'float', 'str', or 'json'. |
value |
TEXT | Stringified value. |
updated_at |
TIMESTAMP | Automatically set to CURRENT_TIMESTAMP on insert/update. |
Well-Known Variables¶
| Module | Name | Type | Values | Meaning |
|---|---|---|---|---|
database |
raw_format |
str | 'pdbqt', 'sdf' |
Format of pose data stored in mols.raw_block.Defaults to 'pdbqt'; set to 'sdf' forserver-based docking programs that return SDF. |
run_dock |
input_structures_total |
int | positive integer | Total number of input structures counted at startup — used for progress reporting. |
easydock_bust |
bust_protein |
str | PDB text | Protein PDB block with explicit hydrogens added, used by PoseBusters. |
easydock_plif |
plif_protein |
str | PDB text | Protein PDB block with explicit hydrogens added, used by ProLIF. |
plif_names¶
Lookup table for protein-ligand interaction contact types, populated by easydock_plif. Avoids repeating long contact-name strings in plif_res.
| Column | Type | Description |
|---|---|---|
plif_id |
INTEGER (PRIMARY KEY AUTOINCREMENT) | Surrogate key for the contact type. |
contact_name |
TEXT (UNIQUE) | Human-readable contact identifier, e.g., 'GLU80.A.HBDonor'.Format: <residue>.<chain>.<interaction_type>. |
plif_res¶
One row per detected contact, per pose, per molecule. References mols by rowid and plif_names by plif_id.
| Column | Type | Description |
|---|---|---|
mols_rowid |
INTEGER | rowid of the corresponding row in mols. |
pose |
INTEGER | 1-based pose index (1 = best docking pose). |
plif_id |
INTEGER | Foreign key into plif_names. |
Unique constraint: (mols_rowid, pose, plif_id)
To retrieve contacts for a molecule:
SELECT pn.contact_name
FROM plif_res pr
JOIN plif_names pn ON pr.plif_id = pn.plif_id
JOIN mols m ON pr.mols_rowid = m.rowid
WHERE m.id = 'CHEMBL12345' AND m.stereo_id = 0 AND pr.pose = 1;
bust¶
One row per tested pose per molecule. Populated by easydock_bust.
| Column | Type | Description |
|---|---|---|
mols_rowid |
INTEGER | rowid of the corresponding row in mols. |
pose |
INTEGER | 1-based pose index. |
result |
INTEGER | 1 if the pose passes all PoseBusters checks; 0 otherwise. |
Unique constraint: (mols_rowid, pose)
Schema Version¶
EasyDock tracks schema version via SQLite's built-in PRAGMA user_version. The current version is 1.
On database open, session.py checks the version and runs any necessary migrations:
| Migration | From → To | What changes |
|---|---|---|
migrate_setup_table() |
0 → 1 | Converts the old setup table (dynamic columns, one row)to the current key-value schema. |
migrate_pdb_block_to_raw_block() |
legacy → current | Renames the pdb_block column in mols to raw_blockif the old name is still present. Run automatically when raw_format variable is absent. |
The user is prompted before any migration is applied (120-second timeout; auto-abort on no response).
Concurrency Model¶
- WAL mode is enabled on every open connection:
PRAGMA journal_mode=WAL. - During docking, the main process writes results via
update_db()while worker threads read molecules viaMolQueue. MolQueueopens a separate read connection per batch, closes it between fetches, and tracks position byrowid— allowing the writer to insert/update rows concurrently without blocking readers.- Result-retrieval tools (
get_sdf,easydock_bust,easydock_plif) may safely open the database read-only while a run is in progress.