DOI:10.1109/IJCNN.2017.7966232
low-level time-based and frequency-based audio descriptors
frequency-band energy features (energy/frequency)
auditory filter banks (Gammatone, Mel filters)
cepstral features(MFCC)
spatial features (ITD: interaural time difference, ILD: interaural level difference)
voicing features (f0)
i-vector