1: The data distribution of biosynthetic patterns(RuleClusters) at various numbers of steps
In this research, we digitalize the typical biosynthetic patterns existed in our previous study which containing nearly 50,000,000 biosynthetic pathways for producing 6,026 molecules, and construct RxnCluster which contained 14,378 typical biosynthetic patterns (RuleClusters) covering 37,317 reaction combinations(reaction clusters) whose numbers of steps vary from 1 to 4, and each biosynthetic pattern is related to at least two reaction combinations(reaction clusters).To be specific,there are 1,011 RuleClusters contains 1 step, 2,008 RuleClusters contain 2 steps , 3,820 RuleClusters contains 3 steps and 7,539 RuleClusters contains 4 steps.
2:The data distribution of biosynthetic patterns(RuleClusters) at different frequency scales
We also make a statistic analysis on the data distribution of biosynthetic patterns at different frequency scales, in which there are 13,603 RuleClusters related to 2-5 reaction clusters, 725 RuleClusters related to 5-20 reaction clusters, 50 RuleClusters related to at least 20 reaction clusters.
3:The data distribution of biosynthetic patterns for various numbers of steps in each kind of frequency level
We make a statistical analysis about biosynthetic patterns for various numbers of steps in each kind of frequency level. For example, there are 13,603 biosynthetic patterns(RuleClusters) relating to reference reaction clusters whose the frequency count is between 2 and 5. Within these biosynthetic patterns, there are 764 (log10= 2.88)related to 1 step, 1817 (log10= 3.26)related to 2 steps, 3681 (log10= 3.57)related to 3 steps and 7341 (log10= 3.87) related to 4 steps.