Simulating synthetic seismic data from AI-generated velocity models

Ivan Sanchez, Marcelo Guarido, Daniel O. Trad

The efficiency of machine learning workflows depends on the availability of large, labeled, and diverse datasets. In seismic exploration, such datasets are scarce because field data are expensive to acquire, often proprietary and confidential, and labeling requires expert interpretation that is time-consuming, subjective, and inconsistent across surveys. Seismic data augmentation can partially address this limitation by increasing the amount and diversity of training data without requiring new field acquisitions, helping models become more robust, reduce overfitting, and generalize better to different geological settings, noise conditions, and acquisition geometries. However, data augmentation may also introduce unrealistic or physically inconsistent patterns that cause models to learn artifacts and amplify existing biases, reducing their generalization to real field data. Modeled seismic data offer an alternative that is physically consistent and fully controlled, allowing the generation of realistic wave phenomena governed by the elastic wave equation. Unlike simple data augmentation, modeled data provide ground truth labels and ensure that variations in the dataset reflect true physical behavior rather than artificial transformations. In this work, we generate synthetic seismic data from AI-generated velocity models using numerical simulations of the elastic wave equation. We construct 100 velocity models representing diverse structural and stratigraphic patterns, including irregular topography, and simulate shot gathers for each case using elastic finite-difference modeling. The resulting wavefields and shot gathers exhibit a wide range of seismic phenomena, including surface waves, refractions, reflections, and multiples, with variations in timing, amplitude, and complexity determined by the unique velocity structures and topographies of each model. By combining AI-generated velocity models with physics-based simulations, we produce large and customizable datasets that support the development of machine learning methods for seismic exploration, with potential applications in ground-roll attenuation, data reconstruction, and seismic interpretation.