GCI-TPs · Global Chemical Inventory Transformation Products

transformation products

GCI-TPs is a database of predicted transformation products (TPs) for compounds in the ZeroPM global chemical inventory, designed to help surface potentially persistent, mobile and toxic (PMT) or very persistent and very mobile (vPvM) TPs.

From inventory → network → candidates
A graph database of predicted transformations
Inventory compounds
140161
ZeroPM GCI
Starting precursors
98091
After filtering
Predicted TPs
13678364
3 steps × 2 modules
Unique reactions
42797878
Deduplicated
Dead-end TPs
15048
No further predicted steps
PM / PT candidates
638
After additional flags

Why this matters

Novel entities are growing faster than the world can assess risk. Transformation products may be as concerning as (or more concerning than) their parent compounds.

Novel entities pressure

Global inventories contain hundreds of thousands of industrial chemicals and mixtures, far outstripping current risk assessment capacity.

PMT & vPvM concern

Persistent, mobile and toxic compounds can travel long distances in water systems and pose long-term ecosystem and human health risks.

Transformation gap

Many chemicals transform via biotic and abiotic processes. The resulting TPs are under-characterized, limiting monitoring and assessment.

From structures to networks

A quick interactive walkthrough of the pipeline that built GCI-TPs.

1) Inventory & structures
ZeroPM

SMILES for global inventory compounds were retrieved (via PubChem) and standardized to consistent identifiers (InChI / InChIKey) using RDKit.

140161 compounds
13 trade markets
38 countries
2) Applicability & filtering
Domain

Salts/mixtures were split, elemental composition computed, and compounds outside the applicability domain removed (elements outside CHONPS + halogens, MW > 1000 Da, first-step errors).

98091 retained precursors
Element filter
MW filter
3) TP predictions at scale
HPC

BioTransformer environmental microbial and abiotic modules were run for three steps each. A parallel helper pipeline distributed workloads across HPC nodes with adaptive batching and recovery on memory limits.

13678364 predicted TPs
42797878 reactions
3 steps × 2 modules
4) Graph construction & dead-ends
Paths

A directed TP network was built using IKFB nodes and reaction edges. Dead-end products were defined as nodes that generate no further predicted TPs (and were not removed due to errors). Shortest path lengths from each starting compound were computed.

15048 dead-end TPs
Breadth-first search
Deduplicated edges
5) Mobility & toxicity screening
OPERA

For dead-ends, mobility/persistence/toxicity were predicted using OPERA modules (KOC, CERAPP, CoMPARA) with applicability domain rules (AD index thresholds). Mobility thresholds follow EU-CLP criteria (log Koc ≤ 3, very mobile ≤ 2).

3367 in Koc AD
9716 toxic (≥1 model)
638 PM / PT list
Iris HPC cluster
Parallel predictions across 16 nodes (28 cores, 128 GB per node).
Large memory processing
Up to 112 cores and 3 TB RAM for aggregation & graph analytics.
Neo4j graph database
Nodes = compounds (IKFB), edges = reactions + provenance metadata.

Coverage & comparison

Prediction complements experimental TP knowledge: overlap exists, but gaps remain in both reactions and chemical space.

FAIR-TPs environmental reaction overlap
Matched via precursor+product InChIKey first block
492 / 1563
31%
overlap
Captures many known reactions
Highlights reaction knowledge gaps
Complementary to experimental evidence
Chemical space overlap with PubChem
Many predicted structures are not in reference libraries
292,890
0%
of GCI-TPs compounds overlap
PubChem overlap Predicted-only space
Important for non-target screening
Missed annotations if libraries are incomplete
Motivates expanding TP reference space

What you can do on this site

Fast discovery workflows built on a graph database — optimized for exploration.

See predicted transformation pathways

Visualize multi-generation pathways as a directed network from a precursor to its predicted products. This helps connect observed features in samples to plausible origins.

Graph-first navigation
Multi-step generation view
Shortest paths to dead-ends

Prioritize PMT/vPvM candidates

Explore OPERA mobility and toxicity outputs with applicability-domain guidance. PMT prioritization is designed to highlight candidates for follow-up and experimental validation.

log Koc mobility thresholds
Model AD indicators
Candidate lists for review
log Koc ≤ 3
AD index
Toxic (≥1)
Ready to explore predicted TPs?
Search by identifiers, browse dead-ends, and inspect pathways in the graph.
Start exploring