The best models for audio datasets are ResNet and EffNet. Tenenholtz justified the usage of image models for audio datasets.