CAZyme Grids - Help & Info

Genome Set Comparison dbCAN counts grid Erdody protein grid

CAZyme Protein Count Grids — Help & Information

What are CAZymes?

Carbohydrate-Active enZymes (CAZymes) are enzymes involved in the synthesis, modification, and breakdown of complex carbohydrates and glycoconjugates. They are classified into families based on amino acid sequence similarity by the CAZy database. Major CAZyme classes include Glycoside Hydrolases (GH), Glycosyl Transferases (GT), Polysaccharide Lyases (PL), Carbohydrate Esterases (CE), Auxiliary Activities (AA), and Carbohydrate-Binding Modules (CBM).

Data Sources

dbCAN Counts Grid

The dbCAN counts grid displays the number of CAZyme family annotations per organism derived from the dbCAN automated annotation pipeline. dbCAN uses hidden Markov models (HMMs) built from the CAZy database to scan protein sequences and assign them to CAZyme families. Each cell value represents the total count of annotations for a given CAZyme family in a given organism.

Organisms are organized by domain of life: Archaea, Eukaryota, and Bacteria.
Counts are pre-aggregated per organism and CAZyme family.
Color intensity reflects relative abundance using percentile-based quartile boundaries (blue → purple → red → orange).

Erdody Protein Grid

The Erdody grid shows CAZyme protein counts derived from the Erdody consolidated database, which uses BLAST-based homology searches against curated CAZyme reference sequences. Each cell can display three metrics (toggled in the interface):

HSPs — Total number of High-Scoring Segment Pairs (BLAST hits) for that organism/family combination.
Proteins — Number of distinct query proteins with hits to that CAZyme family.
Domains / Protein — Average ratio of HSPs to distinct proteins, indicating multi-domain architecture.

Clicking on an individual count dot opens a modal with the full BLAST alignment details (bit score, E-value, coordinates, identity, gaps) grouped by query protein.

Genome Set Comparison Grid

The comparison grid lets you select up to 50 organisms from any domain and view both dbCAN and Erdody data side by side. Each cell shows a D/E value (dbCAN count / Erdody protein count), making it easy to compare the two annotation methods for the same organism and CAZyme family. You can save and reload named organism sets for repeated analysis.

Substrate and Activity Annotations

Each CAZyme family column includes a substrate sub-header row colored by primary substrate. Clicking on a substrate cell opens a detail modal showing:

All known substrates for that family with hit counts.
Associated enzymatic activities and EC (Enzyme Commission) numbers.

These annotations are sourced from the CAZy database and cross-referenced literature.

Grid Navigation

Domain tabs — Switch between Archaea, Eukaryota, and Bacteria.
Letter navigation — Filter organisms by the first letter of the genus name (used when organism counts are large).
Sort toggle — Order columns by family name or by primary substrate.
Column/row highlighting — Hover over any cell to highlight the full row and column for easier reading.
Sticky headers — Family names and substrate rows remain visible while scrolling vertically. The organism name column stays fixed while scrolling horizontally.

Color Legend

Count values are divided into four color groups based on percentile boundaries computed from all non-zero values in the current view:

	Low counts (up to 50th percentile)
	Moderate counts (50th–80th percentile)
	High counts (80th–95th percentile)
	Very high counts (above 95th percentile)

↑ Top

Small Microbes Pathway Profiler