Releases: StarMoonWang/SeisMoLLM
SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pretrained Large Language Model
Introduction:
Due to the lack of research on pre-training algorithms, the huge cost of training, and the heterogeneity in different datasets, large-scale pretraining for seismic waveform analysis is still in early stages. SeisMoLLM is the first work to explore the idea of cross-modality transfer for earthquake monitoring, and unleash the power of large-scale pretraing to seismic monitoring by transferring LLM (GPT-2 here)
Highlights:
- One Fits All: With a unified network architecture, SeisMoLLM can handle multiple seismic monitoring tasks separately, including back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification.
- Surperising performance: With standard supervised training on DiTing-light and STEAD datasets, SeisMoLLM achieves the state-of-the-art across the five tasks, getting 36 best results out of 43 task metrics, with many relative improvements ranging from 10% to 50%.
- Excerllent generalization: Using only 10% data as training set, SeisMoLLM consistently attains better results than the train-from-scratch baselines, with 12 top scores out of 16 metrics.
- Friendly cost: Despite the introduction of large language models, SeisMoLLM still maintains high training/inference speed that is comparable to or even better than lightweight baselines, and only needs 4× 4090 GPUs to run 3-36 hours for training.
- Fast Inference: SeisMoLLM achieves a very fast inference speed of just 7–8 ms per-trace on RTX-4090, that is 1.5–3.8× faster than most baselines across all tasks, even 2× faster than the smallest baseline, which has about 200× fewer parameters.
Application:
Seismic monitoring tasks include but are not limited to: back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification. SeisMoLLM can achieve superior performance than existing methods when directly apply to unseen data. For optimal performance, we recommend few-shot training.
For more details, please refer to our github page