accfg.main
Core functional group detection logic, including the AccFG class and helper utilities for loading SMARTS patterns.
AccFG Class
Constructor
AccFG(common_fgs=True, heterocycle_fgs=True, user_defined_fgs=None, print_load_info=False, lite=False)
- Parameters
common_fgs(bool): Load bundled common functional groups fromaccfg/fgs_common.csv.heterocycle_fgs(bool): Load heterocycle definitions fromaccfg/fgs_heterocycle.csv.user_defined_fgs(dict[str, str] | None): Optional SMARTS overrides/additions supplied by the caller.print_load_info(bool): Emit a summary of how many groups were loaded.lite(bool): When true, the CSV reader skips rows marked with%or#, yielding a lighter rule set.- Behaviour
- Populates dictionaries for each source and merges them into
self.dict_fgs. - User-defined patterns are canonicalised and
[nH]is normalised to[n]viaprocess_user_defined_fgs. - Stores whether the lite mode was requested for later use.
_is_fg_in_mol(self, mol, fg)
Internal helper that returns (is_present, matches) for a SMARTS query on an RDKit molecule.
is_present(bool): Whether any substructure matches were found.matches(tuple[tuple[int, ...], ...]): Atom index tuples for each match (RDKit order).
_freq_fg_in_mol(self, mol, fg)
Count the occurrences of a functional group SMARTS in a molecule.
- Returns the integer
countorFalsewhen no match is found. - Useful when tallying frequency rather than raw match coordinates.
_get_bonds_from_match(self, query_mol, mol, atom_match)
Translate an atom match from the query SMARTS into the corresponding bond indices on the target molecule. Used for pruning nested functional groups.
- Parameters: RDKit molecules for the query and target, plus a list/tuple of atom indices returned by RDKit.
- Returns:
list[int]of bond indices inmol.
process_user_defined_fgs(self, user_defined_fgs)
Sanitise custom functional group definitions.
- If a SMARTS string already uses
;(indicating an explicit SMARTS pattern), it is preserved. - Otherwise, the SMILES is canonicalised and aromatic
[nH]handling is normalised to[n]. - Returns a new dictionary with processed SMARTS strings.
run(self, smiles, show_atoms=True, show_graph=False, canonical=True)
High-level entry point that accepts a SMILES string, optionally canonicalises it, converts it into an RDKit molecule, and defers to run_mol.
- Parameters
smiles(str): Molecule SMILES.show_atoms(bool): When true, include atom index matches in the output.show_graph(bool): When true, also include the functional group hierarchy graph.canonical(bool): Canonicalise SMILES before processing.- Returns
- If
show_atomsand notshow_graph:dict[str, list[tuple[int, ...]]]. - If
show_atomsandshow_graph: tuple(fg_dict, networkx.DiGraph). - Otherwise:
list[str]of detected functional group names. - Usage
from accfg.main import AccFG
afg = AccFG(print_load_info=True)
fgs = afg.run("CC(=O)O")
run_mol(self, mol, show_atoms=True, show_graph=False, use_atom_map_num=False)
Operate directly on an RDKit molecule and locate all functional groups using a ProcessPoolExecutor.
- Parameters
mol(rdkit.Chem.Mol): Pre-built molecule.show_atoms,show_graph: Same semantics asrun.use_atom_map_num(bool): Remap atom indices to themolAtomMapNumberproperty when present (useful for aligned molecules).- Behaviour
- Launches up to four parallel worker processes to evaluate SMARTS patterns.
- Builds a directed graph of parent/child relationships to remove redundant nested matches (e.g., derivatives).
- Returns: Same as
run. - Notes
- When
use_atom_map_num=True, the resulting atom lists are remapped from indices to mapping numbers. - For
show_graph=True, the returnedDiGraphstoresmapped_atomson each node for downstream visualisation.
run_freq(self, smiles)
Return per-functional-group occurrence counts for a SMILES string.
- Returns:
list[tuple[str, int]]on success orNoneif detection fails. - Usage
counts = afg.run_freq("c1ccccc1O")
csv_to_dict(self, csv_file, lite=False)
Load functional group SMARTS definitions from a CSV file.
- Uses
csv.DictReader, filtering comment lines. In lite mode, both%and#prefixed rows are skipped. - Returns
dict[str, str]mapping the functional group name to its SMARTS pattern.
Internal Utilities
The following helpers support post-processing inside run_mol:
_get_bonds_from_matchand_freq_fg_in_molare used to compare structural context between candidate groups.- The functional group directed graph leverages
networkxto reason about parent/child relationships before emitting results.
Module-Level Helpers
canonical_smiles(smi)
Convert any SMILES string into its RDKit canonical form using Chem.MolToSmiles(Chem.MolFromSmiles(smi)). Useful for normalising inputs before comparison.