Mobile QR Code

1. (College of Engineering and Technology, American University of the Middle East, Kuwait {bahaa.al-sheikh, mohammad.salman}@aum.edu.kw )
2. (Electrical & Electronics Engineering Department, Ankara Science University, Ankara alaa.eleyan@ankarabilim.edu.tr )

HRTF, Spectral notches, 3D auditory display, Wavelet, Multi-resolution analysis, Auto-detection

## 1. Introduction

A Head-Related Transfer Function (HRTF) is defined as the ratio of the frequency response at the ear drum to that at a sound source for both ears. HRTFs comprise all acoustical cues to a sound-source location that are available from that location [1]. In addition to the interaural time difference (ITD) and interaural level difference (ILD), HRTFs are considered as the main cues for sound source localization. They are usually measured for individuals at certain limited locations in terms of azimuth and elevation.

HRTFs are required to create 3D virtual auditory displays (VADs) using headphones. VADs have many applications, including psychoacoustic and physiological research, industry, virtual training, driving training simulation [2], virtual aviation [3], and battle environment simulation [4]. There are also many other applications for VADs in communication, multimedia, mobile products, and clinical auditory evaluations [5].

A Head-Related Impulse Response (HRIR) is defined as the time domain of the HRTFs. One of the most famous methods to measure it is by generating a Dirac delta impulse at a sound source and measuring the output at a microphone located at a subject’s eardrum in an anechoic room. This should be done for each direction in 3D space because the result is significantly dependent on the direction. Different techniques and algorithms have been proposed and implemented to build HRTFs at locations other than the measured ones or at finer resolution [6-8].

Scattering and reflections from the torso and the shoulder of a subject cause the characterization of the HRTF at frequencies less than 3 kHz [9]. Accordingly, the geometry of these body parts doesn’t affect the shape of the HRTF above 3KHz. Previous studies on humans show that there are prominent spectral notches'' and peaks'' in HRTFs above 4-5 kHz. These are dominant cues for elevation and azimuth angles of a sound-source location, which are essential for sound localization, especially for elevations and for determining whether a sound is in front of or behind an observer [10]. Many of these spectral features are caused by pinnae reflections and diffractions, which act as a filter in the frequency domain.

The absence or presence of the peaks gives a strong indication of the sound source elevation [10]. For example, a one-octave peak at 7-9 kHz was presented as an indication of elevations around 90$^{\circ}$ [11]. However, spectral peaks do not show a smooth trend with the changes of the elevations as the spectral notches [10].

The spectral location of the first prominent spectral minima is called the first notch. Human data has shown that changes occur in its center frequency from around 6 kHz to around 12 kHz as the angle of a sound source varies from -15$^{\circ}$ to 45$^{\circ}$ in elevation with a fixed azimuth angle [12]. The first notch is due to the ear concha and is considered as one of the important features for elevation perception of a sound source [11].

HRTFs depend on the shape and geometry of the head, external ears, and body parts, which interact with received sound waves. Because of this, HRTFs can be quite different for different individuals for a given location in space [13]. In order to have a full implementation of a complete VAD for a certain individual, the HRTFs need to be measured or synthesized for all directions (i.e., all elevation and all azimuth angles). Higher directional resolution results in smoother and more effective directional hearing for VADs. The most popular far-field HRTF databases use a directional resolution of 5$^{\circ}$ to 15$^{\circ}$ for both azimuths and elevations. However, measurement of the HRTFs in all directions for subjects is expensive and inappropriate and requires too much preparations.

One of the solutions to this problem is using a structural model of a subject in order to build an individualized HRTF. These models are based on synthesizing the HRTFs based on the anthropometry of the subject, especially the geometry of the pinna, head, torso, and shoulders [14]. Therefore, we hypothesize that if notch and peak frequencies for a certain individual HRTF are close to those of another individual, then using the HRTF of the first individual for the second is more suitable than using other individuals’ HRTFs with significantly different spectral notches and peaks frequencies.

Taking a few measurements at certain locations for an individual can be used to auto-detect the frequencies of the spectral notches and peaks and to compare their values to the notches and peaks frequencies in currently available HRTF databases. The ones with closer notches and peak frequencies at the same measured locations can be used as indications for suitable HRTFs for an individual. To do this, we need to automatically detect the main notches in a measured HRTF for an individual for comparison. Wavelet multi-resolution analysis has been successfully used for auto-detection of events, including notches and peaks in non-stationary signals [15,16]. It was used in this study for the auto-detection of main spectral notches in measured HRTFs.

The rest of the paper is organized as follows. Section 2 discusses the used database and the 3D reference coordinate system, data pre-processing, role of the spectral notches in direction estimation, and discrete wavelet transform. Section 3 discusses the results of applying wavelet multi-resolution analysis on the HRTFs to auto-detect the spectral notches. The paper is concluded in section 4.

## 2. Methods

### 2.1 Database and Coordinate System

An interaural polar coordinate system was used in this study. The elevation (EL) represents the latitude, and the source azimuth (AZ) represents the longitude. The location at (AZ=0$^{\circ}$, EL=0$^{\circ}$) corresponds to the direction in front of the subject. Negative elevations are below the horizontal plane, and positive elevations are above it. EL=90$^{\circ}$ corresponds to the direction directly above the subject’s head, and (AZ=180$^{\circ}$, EL=0$^{\circ}$) corresponds to the direction directly behind it. Negative azimuth angles are to the left side, and positive ones are to the right of the subject.

In this study, we used HRIRs from the Center for Image Processing and Integrated Computing-University of California (CIPIC) database [17]. It contains HRIRs for 43 subjects with 27 anthropometric measurements for subjects’ heads, torsos, and pinnae. For each subject, HRIRs are measured at azimuth angles between -80$^{\circ}$ and 80$^{\circ}$ and elevation angles between -45$^{\circ}$ and 230.624$^{\circ}$. There are a total of 1250 directions for each subject, and the sampling frequency is 44.1 kHz. HRTFs have been calculated in this study from HRIRs by taking the Fourier Transform using 512 points with a frequency resolution Δf of 86.13 Hz.

For the purpose of this study, we used directions at elevations between -45$^{\circ}$ and 45$^{\circ}$ in the median plane as an example (i.e., at an azimuth angle of 0$^{\circ}$ for the right ear of some randomly selected subjects from the CIPIC database). Subjects 3, 8, 9, 10, 11, and 12 were selected in this study. Matlab® 2014 was used for reading the data, pre-processing, wavelet multi-resolution analysis, and notch auto-detection of the HRTFs’ spectral notches.

### 2.2 Pre-processing of Data

HRIRs have been windowed by 2 ms using a Hanning window in order to remove echoes in the raw data, including some reflections caused by torsos, shoulders, and knees. This causes indirect smoothing for the HRTFs. Smoothing in the frequency domain does not affect the localization capability given that the main spectral features are kept [18]. Phase responses are ignored, and only magnitudes of the HRTFs are considered because many studies have proven that HRTFs can be accurately represented by their minimum phase spectra. The reason is that the auditory system is not sensitive to the absolute phase of a sound applied to a single ear [18,19].

### 2.3 HRTF Spectral Notches

Notches and peaks in the HRTFs are direction-dependent, so they indirectly provide information about the direction of a sound. In addition, they depend on the shape and size of the pinna, which are different among individuals. Fig. 1 presents an example of the notches and peaks in the right-ear HRTF at the location of (AZ=0$^{\circ}$, EL=-45$^{\circ}$) for subject 10 of the CIPIC database.

Fig. 1. Example of spectral notches for right-ear HRTF at 0° azimuth and -45° elevation for subject 10 of CIPIC database.

### 2.4 Discrete Wavelet Transform

A wavelet is a limited-duration waveform that is irregular and often non-symmetrical. Its average value equals zero, and it has the capability to describe abnormalities, pulses, and other events. Wavelet analysis involves the decomposition of a signal using an orthonormal group of basis functions, such as the sines and cosines in a Fourier series. Scaling or dilation in wavelet terminology means stretching the wavelet in time, which is related to the frequency in Fourier series terminology. Translation in wavelet terminology is the shifting of the wavelet to the right or left in the time domain. A mother wavelet'' refers to an unstretched wavelet. A Continuous Wavelet Transform (CWT) represents all possible integer factors of shifting and stretching the wavelet, while a Discrete Wavelet Transform (DWT) stretches and shifts in a dyadic scale using powers of 2 (e.g., 2, 4, 8, 16, etc.) [20].

Wavelet decomposition splits a signal into two parts using high-pass and low-pass filters. Using more filters splits the signal into more parts. A low-pass filter (scaling function filter) gives a smoothed version and approximation of the signal, while a high-pass filter (wavelet filter) gives the details. When details and approximations are added together, they can reconstruct the original signal.

Usually, each approximation is split into more approximations and details, and so on. Selecting certain levels of details or approximations can be used to choose certain events or parts of a signal that have a certain range of frequencies. Convolution of the wavelet function ѱ(t) with signal x(t) gives the wavelet transform, T, while the convolution of x(t) with the scaling function ϕ(t) produces the approximation coefficient, S.

The discrete wavelet transform (DWT) can be expressed as:

##### (1)
T m , n = x t ψ m , n t d t

The coefficient of the signal approximation at scale m and location n can be expressed as:

Fig. 2. A 3-level discrete wavelet transform. Each filter (high pass or low pass) is followed by decimation or down-sampling by 2. cA1 represents the first-level approximation coefficients, cD2 represents the second-level detail coefficients, cA3 represents the third-level approximation coefficients, etc.

##### (2)
S m , n = x t m , n t d t

For a discrete input signal of finite length and a range of scales 0 < m < M, a discrete approximation of the signal can be expressed as [21]:

##### (3)
x 0 t = x M t + m = 1 M d m t ,

where $\textit{x}$$_{M} (\textit{t}) is the signal approximation at scale M, and the signal detail at scale m is expressed as: ##### (4) d m t = n = 0 M m T m , n ψ m , n t . Usually, approximations are repeatedly divided into low frequencies (approximations) and high frequencies (details) to find the next level of wavelet analysis using more filters, as shown in Fig. 2. This figure shows a three-level wavelet decomposition as an example. The low and high pass filters’ impulse responses are dependent on the chosen wavelet. There are many kinds of wavelets, such as Haar, Daubechies, Biorthogonal, Symlet, and Coiflet wavelets. Symlets 2 through 8 and Daubechies 1 and 2 wavelets were tested in this study because of their similarity in shape to the HRTF spectral notches at the different directions among the subjects, which make them suitable for the auto-detection problem. The decomposition analysis was done up to level 6 for each wavelet. Fig. 3 shows examples of some Symlet wavelets used in this study. The proposed algorithm determines the most suitable HRTF set for an individual from a database, as presented in Fig. 4. Fig. 3. Examples of some mother wavelets: (a) Symlet2; (b) Symlet3; (c) Symlet5; (d) Daubechies2. Fig. 4. Proposed algorithm description to choose best individualized HRTF. ## 3. Results and Discussion Fig. 5 shows an example of an HRTF wavelet decomposition using one of the tested wavelets, Symlet5, up to level 6. Low-level details represent the highest frequencies in the HRTF. The energy of the main notches was noticed in all detail levels from level one to level 5 (i.e., \textit{D}$$_{1}$ to $\textit{D}$$_{5}). The first three levels (\textit{D}$$_{1}$, $\textit{D}$$_{2}, and \textit{D}$$_{3}$) have a clear resemblance to the spectral notches compared to other signal information in these three levels. Therefore, these three levels were used for the auto-detection of the main notches in the HRTFs.

Fig. 5. HRTF at 0° azimuth and -45° elevation (AZ=0°, EL= -45°) for subject 10 and its wavelet decomposition up to level 6 using Symlet5.

To give more significance to the highest frequency components, detail $\textit{D}$$_{1} coefficients were multiplied by a higher factor. The reconstructed signal from wavelet levels \textit{D}$$_{1}$, $\textit{D}$$_{2}, and \textit{D}$$_{3}$ was used according to the following proposed equation, which gives higher weight for the lower-level details. The weights of each level were selected empirically for the notches of the database subjects to maximize the auto-detection sensitivity:

##### (5)
R = I D W T 10 D 1 + 8 D 2 + 5 D 3

where R represents the reconstructed signal using the inverse-discrete wavelet transform (IDWT) of certain weights of $\textit{D}$$_{1}, \textit{D}$$_{2}$, and $\textit{D}$$_{3}$ details.

The reconstructed signal from these details for the HRTF example in Fig. 5 is shown in Fig. 6. Locations of the spectral notches are simply auto-detected and marked as local peaks of the squared-absolute reconstructed signal in Figs. 6 and 7. These figures show examples of notch auto-detection at two different directions using Symlet5 for subjects 10 and 11, respectively.

Local peak selection from the squared absolute reconstructed signals was simply done using a frequency sample that is larger than the neighboring samples and restricted to a one peak in a window of 1 kHz. The reason was that it is unusual to have more than one main spectral notch within this frequency range. Almost all peaks need to be detected as long as they are higher than 1% of the maximum of the squared absolute reconstructed signal, which is considered as the peaks’ amplitude threshold.

According to the Fourier transform applied to the HRIRs, the frequency resolution for the processed data, Δf, is 87 Hz. An analysis was done on CIPIC data subjects 3, 8, 9, 10, 11, and 12. Spectral notches located between 4 kHz and 16 kHz were considered for the analysis in this study because pinna cues usually lie in this range of frequencies [22]. Furthermore, this range has essential cues for sound localization [12], where the total number of notches in the selected subjects at the stated locations is 238 notches.

Fig. 6. (a) HRTF at (AZ=0°, EL=-45°) direction for Subject 10; (b) Reconstruction signal according to Eq. (5) using Symlet5; (c) Absolute square of signal in (b) with the auto-detected local peaks as small red circles.

Fig. 7. (a) HRTF at (AZ=0°, EL=-39.375°) direction for Subject 11; (b) Reconstruction signal according to Eq. (5) using Symlet5; (c) Absolute square of signal in (b) with the auto-detected local peaks as small red circles.

The performance of the auto-detection capability of the selected wavelets is presented in Table 1. The results were sorted according to their auto-detection sensitivity. The sensitivity $\textit{S}$ is defined as:

##### (6)
S = T P T P + F N × 100 %

where $\textit{TP}$ and $\textit{FN}$ represent the number of true positives (correctly detected notches) and number of false negatives (missed notches), respectively.

Table 1. Performance of wavelets on HRTF spectral notches auto-detection.

 Wavelet Sensitivity (%) sym2 100 db2 99.6 sym3 92.9 db3 92.4 sym4 90.8 sym5 89.1 sym6 87.4 sym7 86.1 sym8 86.1

Around 70% of the auto-detected notches were accurately detected with the exact central frequency compared to the manually examined ones. Around 28% were auto-detected with a difference of ${\pm}$Δf from the actual notch frequency, and 2% were different by ${\pm}$2Δf from the actual central frequency. These values are almost the same among all wavelets tested. A slight difference occured between the actual notch frequency and the auto-detected ones when the actual notch was very shallow and not deep enough to be auto-detected accurately. However, these shallow spectral notches do not play an important role in sound-source localization for humans compared to the deep spectral notches because they are not associated with significant reflections.

All deep spectral notches were auto-detected accurately without any difference from the original notches’ central frequencies. Usually, the measured HRIRs and the calculated HRTFs are normalized, so when the depth of the notch is discussed, we refer to the relative attenuation in the frequency response. A higher slope and lower relative amplitude of the spectral notch result in higher amplitude in the squared amplitude of the squared signal, which gives a direct indication of the drop in the notch amplitude.

Many studies have proposed different algorithms to find individualized HRTFs. Some of these studies describe the relation between anthropometric parameters of the subjects, especially their pinnae, and the HRTF features at different locations [14]. HRTFs are modeled accordingly, given the fact that HRTF describes the interaction between the sound waves and the human head, torso, and shoulder geometry. This approach is complicated and needs accurate estimation of the anthropometric parameters and clear understanding about these parameters and their characterization of the HRTF.

Other studies model the HRTFs measured at certain directions and then estimate HRTFs at all other locations using different interpolation methods [6,8]. Most of these studies validated the interpolation in a limited range of directions in terms of azimuth and elevation angles, and some of the models have high computational complexity. Even though the algorithm proposed does not create or model an individualized set of HRTFs for a subject, it can be used to find the closest set of HRTFs among available HRTF databases that have already been measured in different institutions and labs around the world. Thus, it can be used for a subject to save time and effort and to provide a good approximation for individualized HRTFs.

## 4. Conclusions

Spectral notches of HRTFs play important roles as spectral cues for sound-source localization for humans. Accurate auto-detection and estimation of the spectral notches is considered an important step to check the similarity between HRTFs of a certain subject and ones in databases in order to find a suitable HRTF set for that subject. Wavelet multi-resolution analysis using decomposition of up to three levels by Symlet2 to Symlet8, Daubechies2, and Daubechies3 wavelets have been used successfully to auto-detect frequencies of spectral notches in the HRTFs.

Symlet2 outperformed the other tested wavelets in terms of auto-detection capability, and it auto-detected all spectral notches in all tested HRTFs. Most of the auto-detected notches were detected by the exact central frequency of the notches. Nevertheless, future work remains to subjectively validate the proposed method by a subjective test, as well as to test more directions and more subjects.

### REFERENCES

1
Middlebrooks J.C., Green D.M., 1992, Observations on a principal components analysis of head-related transfer functions, J. Acoust. Soc., Vol. 92, No. 1, pp. 597-599
2
Krebber W., Gierlich H.W., Genuit K., 2000, Auditory virtual environments: basics and applications for interactive simulations, Signal Processing, Vol. 80, No. 11, pp. 2307-2322
3
Doerr K.U., Rademacher H., Huesgen S., 2007, Evaluation of a low-cost 3D sound system for immersive virtual reality training systems, IEEE Trans. On Visualization and Computer Graphics, Vol. 13, No. 2, pp. 204-212
4
Jones D.L., Stanney K.M., Foaud H., 2005, An optimized spatial audio system for virtual training simulations: design and evaluation, in Proceedings of Eleventh Meeting of the International Conference on Auditory Display (ICAD 05), Limerick, Ireland, pp. 223-227
5
Xie B., 2013, Head-Related Transfer Function and Virtual Auditory Display;, J Ross Publishing: Plantation, FL, USA,
6
Gamper H., 2013, Head-related transfer function interpolation in azimuth elevation and distance, J. Acoust. Soc. Amer., Vol. 134, pp. 547-554
7
Al-Sheikh B. W., Matin M. A., Tollin D. J., 2009, All-pole and all-zero models of human and cat head related transfer functions, Proc. SPIE 7444, Mathematics for Signal and Information Processing, 74440X
8
Al-Sheikh B., Matin M.A., Tollin D.J., 2019, Head Related Transfer Function Interpolation Based on Finite Impulse Response Models, Seventh International Conference on Digital Information Processing and Communications (ICDIPC), Trabzon, Turkey, pp. 8-11
9
Algazi V.R., Avendano C., Duda R.O., 2001, Elevation localization and head-related transfer function analysis at low frequencies, J. Acoust. Soc. Am., Vol. 109 , No. 3, pp. 1110-1122.
10
Raykar V.C., Duraiswami R., Yegnanarayana B., 2005, Extracting the frequencies of the pinna spectral notches in measured head related impulse responses, J. Acoust. Soc. Am., Vol. 118, No. 1, pp. 364-374
11
Hebrank J., Wright D., 1974, Spectral cues used in the location of sound sources on the median plane, Journal of the Acoustical Society of America, Vol. 56, No. 6, pp. 1829-1834
12
Langendijk E.H., Bronkhorst A.W., 2002, Contribution of spectral cues to human sound localization, J. Acoust. Soc. Am., Vol. 112, No. 4, pp. 1583-1596
13
Middlebrooks J.C., 1999, Individual differences in external-ear transfer functions reduced by scaling in frequency, J. Acoust. Soc. Am., Vol. 106, No. 3, pp. 1480-1492
14
Algazi V.R., Duda R.O., Satarzadeh P., 2007, Physical and filter pinna models based on anthropometry, in Proc. AES 122nd Conv. AES.
15
Pal S., Mitra M., 2010, Detection of ECG characteristic points using Multiresolution Wavelet Analysis based Selective Coefficient Method, Measurement, Vol. 43, No. 2, pp. 255-261
16
Sammaiah A., Narsimha B., Suresh E., Reddy M.S., 2011, On the performance of wavelet transform improving Eye blink detections for BCI, International Conference on Emerging Trends in Electrical and Computer Technology , Nagercoil, pp. 800-804
17
Algazi V.R., Duda R.O., Thompson D.M., Avendano C., 2001, The CIPIC HRTF database, IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 21-24.
18
Kulkarni A., Isabelle S. K., Colburn H. S., 1999, Sensitivity of human subjects to head-related transfer-function phase spectra, J. Acoust. Soc. Am., Vol. 105, No. 5, pp. 2821-2840.
19
Kistler D. J., Wightman F. L., 1992, A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction, J. Acoust. Soc. Am., Vol. 91, No. 3, pp. 1637-1647
20
Fugal D.L., 2009, Conceptual Wavelets in Digital Signal Processing: An In-depth Practical Approach for the Non-Matematician., CA, San Diego: Space & Signals Technical Publishing
21
Saritha C., Sukanya V., Murthy Y.N., 2008, ECG Signal Analysis Using Wavelet Transforms, Bulg J. Phys., Vol. 35, pp. 68-77
22
Hebrank J., Wright D., 1974, Spectral cues used in the localization of sound sources on the median plane, J. Acoust. Soc. Amer., Vol. 56, No. 6, pp. 1829-1834

## Author

##### Bahaa Al-Sheikh

Bahaa Al-Sheikh received a B.Sc. degree in electronics engineering from Yarmouk University, Jordan, an MSc in electrical engineering from Colorado State University, Colorado, USA, and a PhD in biomedical engineering from the University of Denver, Colorado, USA, in 2000, 2005, and 2009, respectively. Between 2009 and 2015, he worked for Yarmouk University as an assistant professor in the department of Biomedical Systems and Medical Informatics Engineering and served as the department chairman between 2010 and 2012. He served as a part-time consultant for Sand-hill Scientific Inc., Highlands Ranch, Colorado, USA, in biomedical signal processing between 2009 and 2014. Currently, he is an associate professor at the Electrical Engineering Department at the American University of the Middle East in Kuwait. His research interests include digital signal and image processing, biomedical systems modeling, medical instrumentation, and sound-source localization systems.