012 · AI · 2026

Multi-Species Soundscape Network Analysis

Graph-theoretic acoustic niche mapping: 162h of PAM audio, 206 species, 56 sites.

Year
2026
Category
AI
Stack
MATLAB · Python · BirdNET
Links
·

The question

Every species in a healthy ecosystem carves out its own acoustic niche: a specific slice of time and frequency it uses to vocalize without being drowned out by its neighbours. Krause's Acoustic Niche Hypothesis predicts this partitioning is not accidental but evolved. The question driving this project: can you measure it as a network, and use that network to say something rigorous about ecosystem health?

This is an ongoing research project at IIT Delhi, advised by Prof. Shaurya Shriyam, targeting a 10-week timeline from May to August 2026.

Dataset

Four candidate sources were evaluated before settling on a primary:

| Source | Verdict | Reason | |---|---|---| | LILA BC | Not usable | CityNet subset missing; Island Conservation entries are camera-trap images, not audio | | Arbimon | Backup | Crowd-validated PAM recordings but no packaged download, requires per-project assembly | | Amazon Basin (Zenodo) | Superseded | Strong candidate: 3-day continuous PAM from 10 stations. Redundant once BirdCLEF 2025 was identified | | BirdCLEF 2025 | Primary | Self-contained ecosystem: train_soundscapes covers 56 sites × consecutive days × full 24-hour diel cycle from El Silencio Reserve, Colombia |

The BirdCLEF 2025 train_soundscapes dataset is 162 hours of continuous PAM audio across 56 geographic recording sites, covering 206 species across Aves, Amphibia, Insecta, and Mammalia, with 28,564 labeled focal clips for classifier training. Critically, the classifier training data and the network analysis data share the same species pool and ecosystem, eliminating distribution mismatch.

The species distribution is strongly long-tailed: dominant vocalizers like Common Pauraque and Great Kiskadee each account for over 8 hours of audio, while rare niche specialists appear in only a handful of clips. This skew is precisely what Soundscape Similarity Networks are built to surface.

Three network representations

A single recording becomes three different graphs depending on what question you're asking.

Acoustic Bipartite Network: Who vocalizes where in frequency and time?

Nodes are split into two sets: species (U) and time-frequency slots (V, e.g. "5:00 AM, 2–4 kHz"). An edge exists if species i is detected in slot j. This encodes specialization: a generalist species fans out across many slots; a niche specialist occupies a narrow band. The bipartite structure also makes the Acoustic Niche Hypothesis directly falsifiable: high modularity in the projection means species have partitioned acoustic space, low modularity means they haven't.

Interspecific Interaction Graph: Do species avoid each other acoustically?

Nodes are species; edge weight is the Pearson correlation of their windowed activity time series across the recording. If Species A reliably goes quiet when Species B starts, the edge is negative, evidence of acoustic masking or competitive exclusion. Positive edges indicate co-occurring guilds. This graph is built from train_soundscapes overlapping calls, with secondary_labels from train.csv adding co-occurrence signal.

Soundscape Similarity Network: How similar are different places and times?

Nodes are sensor-day pairs (e.g. site H01 on day 3). Edge weight is the Jaccard similarity of species present. This detects regime shifts: a cluster of sensor-days that suddenly diverges from its neighbours indicates a disturbance event or seasonal transition. Rare species with narrow ranges yield high Jaccard uniqueness scores, flagging the most ecologically specialised vocalizers in the dataset.

Pipeline

[ BirdCLEF 2025 .ogg ]
      │
      ▼
[ MATLAB STFT ]  ←── Hamming window, 40ms resolution, 15kHz range
      │                Custom: not toolbox defaults, for version-independence
      ▼
[ Python Bridge (pyrun) ]
      │
      ├──> [ BirdNET ]  ──────> species detection per 5-second window
      │
      └──> [ ViT embeddings ]  > 1-second spectrogram patches → k-NN graph
                                  (unsupervised acoustic niche clustering
                                   before species labels are applied)
      │
      ▼
[ Three Graphs ]  ──> Louvain / Infomap / Centrality / Entropy
Data flows from raw .ogg through MATLAB STFT to Python for neural embedding, then into graph construction and analysis.

The MATLAB STFT is deliberately custom rather than a toolbox call. The reasoning: BirdCLEF spans multiple years of recordings, and MATLAB toolbox defaults can silently shift between versions. A hand-rolled STFT with fixed parameters (Hamming window, 40ms temporal resolution, Nyquist-correct 15kHz range) guarantees identical features across the full multi-year dataset, a prerequisite for any cross-year comparison.

The Python–MATLAB bridge (pyrun) routes spectrogram outputs into BirdNET for species-level detection and a Vision Transformer for unsupervised embedding. The k-NN similarity graph built from ViT embeddings clusters acoustic patches before any species label is applied, giving an independent validation layer: if the unsupervised clusters align with known taxonomy, the pipeline is working correctly.

Week 1 initial finding: Spectrograms across four hand-selected species confirmed taxon-specific frequency partitioning immediately:

  • Insecta (Oxyprora surinamensis): >8 kHz broadband stridulation
  • Amphibia (Espadarana prosoblepon): 4–7 kHz tonal
  • Aves (American Pygmy Kingfisher): 2–8 kHz impulsive broadband
  • Mammalia (Colombian Red Howler Monkey): <1 kHz resonant roar

The frequency bands do not overlap. This is not assumed: it is the first empirical confirmation from this specific dataset that the Acoustic Bipartite Network will produce meaningful structure rather than noise.

Graph algorithms

Once the three networks are constructed, three algorithms extract ecological meaning:

Community detection → acoustic guilds. Louvain and Infomap are applied to the species-interaction graph. A community (guild) is a group of species that vocalize together without interfering: they share time or frequency slots without anti-correlated activity. High modularity Q is the quantitative signature of a healthy, evolved soundscape. Low Q suggests either a degraded ecosystem or a recording window too short to capture the full niche structure.

Eigenvector centrality → keystone vocalizers. A species with high eigenvector centrality is connected to many other highly-connected species; its vocalization pattern anchors the temporal organisation of the soundscape. Removing such a species and observing network fragmentation is a robustness test: if the network falls apart, the species was genuinely keystone, not just common.

Von Neumann entropy → biodiversity proxy. The entropy of the graph Laplacian, S(ρ) = −Tr(ρ ln ρ), measures the complexity of the acoustic network. Higher entropy correlates with higher biodiversity and a richer distribution of acoustic resources across the ecosystem. It provides a single scalar that can be tracked across sites and days, a compact health indicator derived purely from the network structure.

Open design decisions

Three questions are still under active discussion with the advisor:

Taxon scope. Option A starts with Aves only (faster to a clean first result, then expands). Option B includes all 206 species across all four classes from Week 1 (richer bipartite network, stronger paper). Option C builds per-class networks and combines them into a meta-network (highest novelty, most complex). The initial spectrogram results make Option B look tractable: the frequency bands are clean enough that a four-class bipartite network should produce interpretable structure without additional disambiguation steps.

Analysis window length. BirdNET's native resolution is 5 seconds, which is the default candidate. A finer window captures more temporal structure in the bipartite graph but increases noise in species detection. The decision will be driven by the BirdNET detection confidence distribution on Week 2 windowed output.

Target venue. Framing as an ecoacoustics paper (Ecological Informatics, Methods in Ecology and Evolution) positions the network methodology as a tool for field biologists. Framing as a network science paper (e.g. Nature Communications) positions the acoustic domain as a novel application of graph theory. The two framings require different related-work sections and different emphasis in Results.

Stack

MATLAB · Python · BirdNET · Vision Transformer (ViT) · NetworkX · Signal Processing · Graph Theory