SpatialScaper: a library to simulate and augment soundscapes for sound event localization and detection in realistic rooms.

Platform Python arXiv CC BY 4.0


SpatialScaper is still undergoing active development. We have done our due diligence to test that works as expected. However, please open an issue and describe any errors you encounter. Also, make sure to pull often, as we are actively adding more features. Note: You'll need 100GB of storage space to comfortably setup and run the DCASE Task 3 data generation pipeline.



SpatialScaper is a python library to create synthetic audio mixtures suitable for DCASE Challenge Task 3.

Requirements and Installation

To run the SpatialScaper library, manually setup your environment as follows.

Manual Environment Setup

The minimum environment requirements are Python >= 3.8. You could find the versions of other dependencies we use in

git clone
cd SpatialScaper
pip install -e .
Click for more details

Conda Enviroment with Python==3.8

conda create -n "ssenv" python=3.8

Python Virtual Enviroment with Python==3.8

python3.8 -m venv "ssenv"

Preparing Sound Event Assets

First we need to prepare sound event assets for soundscape synthesis. SpatialScaper works with any sound files that you wish to spatialize. You can get started using sound events from the FSD50K and FMA (music) dataset by using.

python scripts/ --download_FSD --download_FMA --cleanup

The --cleanup argument deletes the original FSD50K and FMA zip files (to save space), keeping only the files needed to get started with SpatialScaper.

This creates a datasets/sound_event_datasets/FSD50K_FMA directory with a structure of sound event categories and files.

Attention: the first time setup takes some time ⏳, we recommend running under a screen or tmux session.

Preparing RIR Datasets

python scripts/ --cleanup

The --cleanup argument deletes the original RIR database zip files (to save space).

Attention: the first time setup takes some time ⏳, we recommend running under a screen or tmux session.

Note: stay tuned as we will soon release our A2B ambisonics encoder. In the meantime, refer to the table below to download the respective FOA sofa file for the METU, RSoANU, and DAGA datasets. Place alongside all other sofa files that generates under SpatialScaper/datasets/rir_datasets/spatialscaper_RIRs.

Dataset URL
Full descriptions of available rooms

The available rooms for soundscape generation are as follows:

Room Name Description Trajectory type URL
metu Classroom S05 at the METU Graduate School of Informatics on 23 January 2018. Square Link
arni Arni variable acoustics room at the Acoustics Lab, Aalto University, Espoo, Finland. Linear Link
bomb_shelter Large open space in underground bomb shelter, with plastic-coated floor and rock walls. Ventilation noise. Circular Link
gym Large open gym space. Ambience of people using weights and gym equipment in adjacent rooms. Circular Link
pb132 Small classroom with group work tables and carpet flooring. Ventilation noise. Circular Link
pc226 Meeting room with hard floor and partially glass walls. Ventilation noise. Circular Link
sa203 Lecture hall with inclined floor and rows of desks. Ventilation noise. Linear Link
sc203 Small classroom with group work tables and carpet flooring. Ventilation noise. Linear Link
se203 Large classroom with hard floor and rows of desks. Ventilation noise. Linear Link
tb103 Lecture hall with inclined floor and rows of desks. Ventilation noise. Linear Link
tc352 Meeting room with hard floor and partially glass walls. Ventilation noise. Circular Link
motus Seminar room with configurable furniture, carpet tiles, and absorption wedges. Sparse Link
rsoanu ANU School of Music Recording Studio with variable wall panels: wood or felt. Rectangular Link
daga Small conference room with large wood table and carpet flooring. Sparse Link

Note that SRIR directions and distances differ with the room. Possible azimuths span the whole range of $\phi\in[-180,180)$, while the elevations span approximately a range between $\theta\in[-50,50]$ degrees.

Quick Examples for New Users

Below we present the The example generates 20 soundscapes, 1 minute long each, using audio clips from FSD50K, spatialized in the gym room. These soundscapes are consistent with the DCASE Task 3 format.

Execute as:

import numpy as np
import spatialscaper as ss
import os

# Constants
NSCAPES = 20  # Number of soundscapes to generate
FOREGROUND_DIR = "datasets/sound_event_datasets/FSD50K_FMA"  # Directory with FSD50K foreground sound files
    "datasets/rir_datasets"  # Directory containing Room Impulse Response (RIR) files
ROOM = "bomb_shelter"  # Initial room setting, change according to available rooms listed below
FORMAT = "mic"  # Output format specifier
N_EVENTS_MEAN = 15  # Mean number of foreground events in a soundscape
N_EVENTS_STD = 6  # Standard deviation of the number of foreground events
DURATION = 60.0  # Duration in seconds of each soundscape, customizable by the user
SR = 24000  # SpatialScaper default sampling rate for the audio files
OUTPUT_DIR = "output"  # Directory to store the generated soundscapes
REF_DB = -65  # Reference decibel level for the background ambient noise. Try making this random too!

# List of possible rooms to use for soundscape generation. Change 'ROOM' variable to one of these:
# "metu", "arni","bomb_shelter", "gym", "pb132", "pc226", "sa203", "sc203", "se203", "tb103", "tc352"
# Each room has a different Room Impulse Response (RIR) file associated with it, affecting the acoustic properties.

# FSD50K sound classes that will be spatialized include:
# 'femaleSpeech', 'maleSpeech', 'clapping', 'telephone', 'laughter',
# 'domesticSounds', 'footsteps', 'doorCupboard', 'music',
# 'musicInstrument', 'waterTap', 'bell', 'knock'.
# These classes are sourced from the FSD50K dataset, and
# are consistent with the DCASE SELD challenge classes.

# Function to generate a soundscape
def generate_soundscape(index):
    track_name = f"fold5_room1_mix{index+1:03d}"
    # Initialize Scaper. 'max_event_overlap' controls the maximum number of overlapping sound events.
    ssc = ss.Scaper(
        speed_limit=2.0,  # in meters per second
    ssc.ref_db = REF_DB

    # static ambient noise

    # Add a random number of foreground events, based on the specified mean and standard deviation.
    n_events = int(np.random.normal(N_EVENTS_MEAN, N_EVENTS_STD))
    n_events = n_events if n_events > 0 else 1  # n_events should be greater than zero

    for _ in range(n_events):
        ssc.add_event()  # randomly choosing and spatializing an FSD50K sound event

    audiofile = os.path.join(OUTPUT_DIR, FORMAT, track_name)
    labelfile = os.path.join(OUTPUT_DIR, "labels", track_name)

    ssc.generate(audiofile, labelfile)

# Main loop for generating soundscapes
for iscape in range(NSCAPES):
    print(f"Generating soundscape: {iscape + 1}/{NSCAPES}")


