Introduction
The human auditory system’s remarkable ability to efficiently process sound surpasses any existing machine-hearing system. As the primary input mechanism for auditory processing, the cochlea, organized tonotopically, performs a crucial role. It decomposes, converts, and nonlinearly amplifies sound waves into electrical signals, which are then transmitted to the nervous system. The cochlea is characterized by an exceptionally wide dynamic range (0-120 dB SPL) and high frequency selectivity. For decades, researchers have strived to engineer machine hearing systems that can replicate the function and efficiency of the human auditory system. A fundamental step towards this goal is the development and implementation of cochlear models, with varying degrees of complexity, in diverse ways.
Auditory Filter Models: Transmission-Line vs. Filterbanks
Cochlear models are broadly categorized into two types: transmission-line (TL) models and auditory filterbanks. TL models represent the cochlear partition as an interconnected mass-spring-damper system, effectively modeling wave propagation on the Basilar Membrane (BM). While TL models accurately simulate BM wave propagation and are physiologically faithful, they present significant computational challenges due to their complex time-domain differential equations.
Auditory filterbank models, conversely, utilize either parallel or cascade filters to simulate BM wave propagation. Parallel filterbank models employ independent filters, such as rounded-exponential (roex) filters, gammatone filters (including gammachirp), or pole-zero filters, all connected in parallel to a single input signal. Cascade filterbank models, like the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) model or biophysical models, use a cascade of filters instead.
Parallel filterbank models primarily focus on replicating observed mechanical responses, often overlooking the cochlea’s biological structure. Some parallel models incorporate automatic gain control (AGC) to mimic channel couplings. In contrast, 2-D parallel filterbanks model both the fluid within the cochlear duct and the BM, considering both longitudinal and vertical wave propagation.
Cascade filterbank models naturally represent the forward propagation of sound as traveling waves in the cochlea. In these models, each filter stage simulates a segment of the non-uniform wave system, with its output serving as the input for the subsequent section. This cascade structure inherently models forward coupling. Certain cascade models, such as the pole-zero filter cascade (PZFC) and CAR-FAC models, include an AGC feedback loop to model bidirectional channel couplings. This paper focuses on the hardware implementation of the CAR-FAC model, a sophisticated cascade filterbank model.
Nonlinearity in Cochlear Function
The biological cochlea operates as a causal, active, and nonlinear system. Figure 1 illustrates the nonlinearity and frequency tuning of a biological cochlea at different sound pressure levels (dB SPL). Gain is determined by BM displacement (or velocity) relative to stapes motion. Notably, responses near the characteristic frequency (CF) (9 kHz) exhibit nonlinear behavior with varying input levels. Additionally, at lower SPLs, responses show steeper high-frequency roll-off, and the peak gain shifts towards lower frequencies as the input level increases.
Figure 1. The frequency response measured from a chinchilla cochlea for various levels input strength measured in dB of sound pressure level (SPL) adapted from (Ruggero, 1992). The gain is measured by the BM displacement (or velocity) relative to the stapes motion.
In auditory filterbank models, nonlinearities can be modeled as linear filters with signal-level-dependent parameters. For example, parallel and cascade gammachirp filter models (PrlGC and CasGC), all-pole gammatone filter (APGF) models, and PZFC models demonstrate forward compressive nonlinear response by adjusting pole and/or zero positions. AGC-based models use output level feedback to modify filter parameters, resulting in a compressive input-output function. This feedback nonlinearity is inspired by the function of outer hair cells (OHCs) in the mammalian cochlea. The PZFC analog cochlear model and the CAR-FAC model are prominent examples of such models, effectively capturing the cochlea’s intricate nonlinear dynamics.
Motivation for CAR-FAC Model Implementation
The CAR-FAC model, a digital cascade auditory filter model proposed by Richard Lyon, offers a close approximation of the physiological components of the human cochlea and effectively mimics its qualitative behavior. The CAR part of the model simulates the BM function, translating cochlear fluid pressure waves (converted from sound waves by the middle ear) into maximal displacement positions along its length. Its pole-zero cascade form is parameter-efficient in the z-domain compared to filters like gammatone and gammachirp, providing an excellent fit to human tone detection data in masking noise. The FAC part models the functions of OHCs, inner hair cells (IHCs), and the medial olivocochlear efferent system. It transduces cochlear mechanical vibrations into electronic signals and applies nonlinear gain control feedback to the BM via the OHCs. FAC nonlinear effects include rapid wide-dynamic-range compression and frequency distortions such as cubic difference tones (CDTs) and quadratic difference tones (QDTs), achieved by dynamically adjusting the positions of CAR resonator poles and zeros in the z-plane.
Comparative studies of cochlear models have highlighted the CAR-FAC model’s strong agreement with biological data at a reasonable computational cost. These attributes motivated the development and investigation of the CAR-FAC model’s characteristics and potential applications, particularly in machine hearing.
The goal is to implement a digital ASIC version of the CAR-FAC model for machine hearing applications due to its potential for small size, energy efficiency, and stability compared to analog implementations. For prototyping and validation, an FPGA implementation on an Altera Cyclone V starter kit was chosen. This paper expands on a previous introduction of the CAR-FAC system on FPGA, presenting the complete system and comprehensive measurement results.
Materials and Methods
The CAR-FAC Model: Components and Structure
The CAR-FAC model comprises a cascade of asymmetric resonators (CAR), a digital OHC (DOHC) model, a digital IHC (DIHC) model, and an AGC loop, as depicted in Figure 2. Each resonator *Hi in the cascade is interconnected to the next stage and the DIHC. It also provides an intermediate velocity variable to the DOHC. The DIHC feeds back to the DOHC via the AGC loop. The DOHC integrates the AGC loop output and velocity, feeding back to the resonator. The CAR-FAC output includes multi-channel BM outputs (yi) and a DIHC output, which can be converted into neural activity patterns (r*i).
Figure 2. Structure of the CAR-FAC model. x is the input sound, H1 to *HN are the transfer functions of the CAR part, and y1 to yN represent the CAR-FAC output. The CFs of the CAR resonators decrease from left to right. The DOHC, the DIHC and the AGC loop comprise the FAC part. The neural activity pattern (NAP) rate outputs, r1 to r*N, are estimations of average instantaneous nerve firing rates.
Cascade of Asymmetric Resonators (CAR)
The CAR utilizes an asymmetric resonator, a coupled form two-pole-two-zero filter, as shown in Figure 3. The filter’s transfer function in the z-domain is:
H(z)=YX=g[(z-zzero)(z-zzero*)(z-zpole)(z-zpole*)] =g[z2+(-2a0+hc0)rz+ r2z2-2a0rz+ r2] (1)
Figure 3. Structure of the two-pole-two-zero resonator. a0, c0, and h are the resonator coefficients, r is the pole/zero radius in the z plane, g is the DC gain factor, W0 and W1 are the intermediate variables, x is the input, and y is the output.
The two-pole coupled form features conjugate poles (zpole and zpole*):
zpole,zpole*=2a0r±(2a0r)2-4r22 =rcos(θR)±irsin(θR) (2)
a0= cos(θR) (3)
where θR is the pole angle in the z plane. Conjugate zeros (*z*zero and zzero*) are defined as:
zzero, zzero*=-(-2a0+hc0)r±((-2a0+hc0)r)2-4r22 =rcos(θz)±irsin(θz) (4)
a0-hc0/2= cos(θZ) (5)
where θZ is the zero angle in the z plane. The zero radius is identical to the pole radius, r. The condition for complex zeros is relevant for high-frequency channels, where cos(θR)
a0-hc02> -1 (6)
h2+2a0c0 (7)
Coefficient g controls the stage DC gain and is set to maintain a unit DC gain for each filterbank stage:
g= 1H(1)= 1-2a0r+r21-2(a0-hc0)r+r2 (8)
This structure allows for simultaneous movement of zeros and poles by varying r while keeping h constant. Zeros are positioned slightly above poles in frequency, with coefficient h determining the distance. Smaller h values place zeros closer to poles, resulting in a steeper (asymmetric) roll-off, enhancing frequency selectivity. Here, h is set to c0, positioning the zero frequency half an octave above the pole frequency.
Importantly, changing poles and zeros via r leaves the zero-crossing times of the filter’s impulse response nearly unchanged over time. This characteristic aligns with physiological observations, where impulse response zero crossings remain largely constant with stimulus level variations.
Initial zero and pole positions are set for each cascade stage. Poles of the two-pole-two-zero resonator are chosen to be evenly spaced along the normalized cochlea length based on the Greenwood map function:
f=165.4(102.1x-1) (9)
Here, x represents the normalized position along the cochlea (0 at the apex to 1 at the base), and f is the pole frequency.
In the CAR-FAC model, FAC effects are implemented by adjusting the initial CAR pole and zero positions by varying their radius r.
Digital Outer Hair Cell (DOHC) Model
The DOHC model mimics the function of OHCs, actively and nonlinearly amplifying wave propagation within the cochlea. The CAR-FAC model’s DOHC gain control mechanism integrates both a local instantaneous nonlinearity and a multi-time-scale nonlinearity, as shown in Figure 4. Instantaneous nonlinearity is based on BM velocity (rate of change of W1), while multi-time-scale nonlinearity is derived from DIHC feedback via the AGC loop filter. Both combine to modulate the pole (zero) radius r:
r=r1+drz×(1-b)×NLF(v) (10)
where r1 is the minimum radius, corresponding to maximum resonator damping. In digital implementation, r1 is:
r1=1-damping×(2πffs) (11)
The damping coefficient controls the damping factor, f is the CF (Equation 9), and *f*s is the sampling frequency. r1 prevents damping from reaching zero, avoiding Hopf bifurcation and ensuring bounded damping. The increment of r above r1 is the relative undamping, a product of the nonlinear function (NLF) of CAR velocity and the AGC loop (b). Coefficient d_rz controls the influence rate of velocity and AGC loop on damping, set to 0.7 × (1-r1).
Figure 4. Structure of the DOHC model. The instantaneous nonlinearity performs a nonlinear gain control (NLF) on the CAR velocity, which is calculated from the BM coefficient W1. The multi-time-multi-scale dynamic gain-control factor, b, is obtained from the AGC loop. Both gain control factors are combined to change r through Equation (10).
The NLF function within the DOHC is defined as:
NLF(ν)= 11+(ν×scale+offset)2 (12)
where ν is CAR velocity, scale is 0.1, and offset is 0.04. At high velocities, the velocity-squared function rapidly increases, saturating NLF towards zero and thus saturating damping towards a high-level limit.
This level-dependent damping mechanism introduces frequency distortions. The velocity-squared function includes a double-frequency term interacting with CAR coefficients (a0r and c0r) to generate CDTs. With two tones (f1, f2 where f1 < f2), a third tone at frequency (2f1–f2) emerges and propagates through the filter cascade. The offset in the NLF function introduces a first-order damping factor, interacting with CAR coefficients to generate QDTs (f2–f*1).
Digital Inner Hair Cell (DIHC) Model
The DIHC model simulates IHC function, comprising a high-pass filter (HPF), a transduction nonlinearity unit, a transducer unit, and two LPFs, as shown in Figure 5. IHCs are mechano-electrical transducers sensing BM vibration, converting mechanical motion to electrical signals, and transmitting them to the nervous system. The DIHC model’s HPF suppresses CAR output frequencies below 20 Hz. The transduction nonlinearity includes a half-wave rectifier (HWR) and a rational sigmoid function:
u=HWR(BMhpf+0.175) (13)
n=u3u3+ u2+0.1 (14)
where *BM*hpf is the high-pass filtered CAR output, u is an intermediate variable, and n is the transduction nonlinearity output. HWR mimics the directional sensitivity of IHC transduction, primarily responding in one direction. The constant 0.175 maintains nonlinearity at a fixed value at zero response. The rational sigmoid function (14) provides near-linear response at low amplitudes and saturation at higher amplitudes.
Figure 5. Structure of the DIHC model. It comprises a HPF, a transduction nonlinearity unit, a transducer unit and two LPFs.
The transducer unit detects and amplifies signal onset, then compresses and rapidly reduces response gain post-onset. It is implemented by:
m=1-q (15)
y=nm (16)
qnew=(1-a)q+a(cy) (17)
where m is the adaptive gain of input n, c is set to 20, and q is the LPF state. The first-order FIR LPF time constant is 10 ms. Two final FIR LPFs smooth the output with an 80 μs time constant each.
Automatic Gain Control (AGC) Loop
The AGC loop consists of a four-stage cascade FIR LPF, with each stage coupled with neighbors to form a three-stage spatial LPF (Figure 6). It feeds the DIHC signal back to the DOHC at a lower update rate than other CAR-FAC components, modeling the medial olivocochlear system’s efferent feedback, which exerts AGC on BM vibration via OHCs. Each AGC smoothing filter (SF) stage includes a temporal linear LPF with coefficient c_t and a three-tap spatial LPF. Spatial LPF coefficients [*s1, 1–s1-s2, s2] weight left neighbor, current channel, and right neighbor values, maintaining a total mixing gain of 1. For a 44.1 kHz signal, in the fastest stage, AGC-SF4, c_t is 0.09, s1 is 0.14, and s2 is 0.2. Each AGC-SF input originates from the DIHC and its lower stage accumulation. The AGC-SF4 output b* feeds back to the DOHC.
Figure 6. Structure of the AGC loop. Four stages of the temporal smoothing filters (SF) (Upper). Each stage consists of a temporal LPF with a defined time constant (0.002, 0.008, 0.032, and 0.128 s) and a three-tap spatial smoothing filter. The internal structure of an AGC-SF (Lower), the input of the AGC-SF comes from the lower filter stage with the smaller time constant as well as the accumulation of the DIHC. The output goes to the next stage of the temporal filter. The spatial smoothing filter is a three-tap smoothing filter coupled with lateral channels. s1, s2, and 1–s1-s2 are the spatial filter coefficients. c_t is the temporal LPF coefficient calculated from the time constant.
FPGA Implementation of the CAR-FAC System
The CAR-FAC system is efficiently implemented on an FPGA, offering configurability in filter parameters and channel numbers. Figure 7 illustrates the system architecture, including an audio codec, CAR-FAC module, controller, and interface module. The system supports two audio input methods: the SSM2603 audio codec on the FPGA board and recorded audio files from a PC host via USB 3.0.
Figure 7. Architecture of the CAR-FAC FPGA system. The system consists of an audio codec, a CAR-FAC module, a controller module and an interface module. The FPGA board is hosted by a PC through a USB interface.
The CAR-FAC module implements the components described earlier. The CAR module can operate independently; deactivating the FAC function disables DOHC and AGC loop functions, maintaining CAR coefficients (a0, c0, g, h, and r) at initial values, enabling linear CAR system operation.
The controller module manages system data flow, including writing initial coefficients, inputting audio files to the CAR-FAC module, and routing CAR-FAC module output to the interface module. System output selection is available, allowing choice between BM or DIHC output.
The interface module includes a data synchronization module, external memory, and a USB interface. Data synchronization ensures data consistency between the system clock domain (250 MHz) and interface clock domain (100 MHz). The 1 GB DDR3 SDRAM stores CAR-FAC output data. The USB interface facilitates communication between the FPGA board and PC, transmitting initial coefficients, input audio files, and system output data.
The CAR-FAC model was initially simulated in Python using floating-point numbers, then verified with fixed-point numbers to determine FPGA implementation word length. 20-bit BM and DOHC variables, 14-bit DIHC and AGC variables were used to approximate floating-point CAR-FAC performance and accommodate a 70 dB input dynamic range. Pipelining and time-multiplexing techniques were used to parallelize CAR, DOHC, and DIHC_AGC modules and reuse single hardware modules, creating a compact, reconfigurable CAR-FAC system. Figure 8 shows the system design diagram.
Figure 8. CAR-FAC system design diagram. The CAR-FAC system is implemented with 20-bit word length for the design coefficients, BM output, and DOHC output, and 14-bit for the DIHC output and the AGC output. The controller state machine determines the cochlear channel to be processed at any particular time and controls the CAR-FAC coefficients and data for that channel. The BM_start signal controls the start of the system through the controller, and it is triggered by the Audio_in_ready signal. The ohc_sel is a selector switch for the CAR/CAR-FAC function. The agc_sel is a switch for the AGC loop function. The CAR state machine calculates the transfer function of Equation (1) and controls the DOHC and DIHC_AGC start in the system. The DOHC state machine calculates Equation (10–12) and feeds back an updated r to the CAR. The DIHC-AGC calculates Equation (13–17), as well as the AGC_loop function shown in Figure 6. The AGC output b feeds back to the DOHC module via Equation (10). The pipelined CAR-FAC timing diagram is shown in lower right.
At a 44.1 kHz sampling frequency, a single CAR-FAC hardware module can implement up to 70 real-time filter channels due to the high speed of digital hardware. The system uses four state machines: a controller state machine (channel processing and coefficient control), a CAR state machine (transfer function calculation), a DOHC state machine (DOHC function and r feedback), and a DIHC-AGC state machine (DIHC and AGC loop functions).
The BM_start signal, triggered by Audio_in_ready, initiates system operation. Selector switches ohc_sel and agc_sel control CAR/CAR-FAC and AGC loop functions, respectively. When ohc_sel is low, the system operates as a linear CAR. When both ohc_sel and agc_sel are high, the full CAR-FAC function is active.
The CAR state machine triggers DOHC and DIHC_AGC start signals. Pipelined CAR, DOHC, and DIHC_AGC structure is shown in Figure 8 (bottom right). Filter channel outputs are stored in external memory and transmitted to the PC via USB.
Table 1 shows device utilization for a single CAR-FAC module. The Cyclone V FPGA’s capacity allows for up to 210 cochlear channels using three CAR-FAC hardware modules due to low resource utilization.
Table 1. Device utilization summary.
Results
CAR-FAC System Transfer Function Measurements
A real-time digital CAR-FAC system with a 44.1 kHz sampling rate was implemented on a Cyclone V FPGA board, covering an input frequency range up to 22.05 kHz. The system’s channel number is reconfigurable, with more channels leading to greater filter overlap while maintaining the same frequency range. For machine hearing applications, approximately 50% overlap in equivalent rectangular bandwidth (ERB) is considered optimal for sound representation. Psychophysical studies indicate that each ERB corresponds to about 0.89 mm on the BM. For the human BM’s total length (~35 mm), this translates to roughly 78 channels with 50% overlap, or 11 channels per octave based on the Greenwood function. Machine hearing models typically use 60 to 100 channels; this implementation features a 70-channel CAR-FAC system to investigate system characteristics.
The measured system transfer function, responding to a -40 dB full scale (FS), 1-second sine tone sweep (20 Hz to 22.05 kHz) with 0.1-second squared-cosine rise and decay times (to minimize spectral splatter), is shown in Figure 9. Input signal intensity is expressed in dB FS relative to a maximum amplitude of FFFFF (20-bit unsigned number), normalized to 1.0 in figures. The upper curves depict the linear CAR response of all 70 channels (FAC function off). The lower curves show the CAR-FAC response. Both CAR and CAR-FAC exhibit increased gain in lower and moderate frequencies and reduced gain in higher frequencies. The FAC function globally compresses the system response gain.
Figure 9. Transfer function of the 70-channel CAR-FAC system to a -40 dB FS, 1 s sine tone sweep from 20 Hz to 22.05 kHz (squared-cosine rise and decay time of 0.1 s to minimize the influence of the spectral splatter). The CAR response (Upper) when the FAC function is switched off; The CAR-FAC response (Lower).
Figure 10 compares CAR and CAR-FAC output in the time domain for 0.5, 1, 2, and 4 kHz tones (10 ms squared-cosine rise and decay) at channels with CFs corresponding to input tones. CAR linearly amplifies input tone amplitude, while CAR-FAC responses demonstrate gradual compressed gain control.
Figure 10. CAR and CAR-FAC output in response to 0.5, 1, 2, and 4 kHz tones with an amplitude of -40 dB FS at the channels of CFs corresponding to the input frequencies.
Excitation Patterns and Nonlinear Growth Characteristics
Excitation patterns, representing BM vibration amplitude to a single sound, were calculated as the root-mean-square (RMS) signal at the output of all CAR-FAC channels. The Greenwood function (Equation 9) was used for the position-frequency map.
Figures 11A-E show excitation patterns in response to 100 ms tones (0.5, 1, 2, 4, and 8 kHz; 10 ms squared-cosine rise/decay) at intensities from -65 dB FS to -15 dB FS (10 dB FS steps). Peak locations in all excitation patterns align with input tones via the position-frequency map, indicating accurate capture of the human frequency-position map.
Figure 11. Excitation patterns calculated as the RMS output signal of the 70 CAR-FAC channels in response to tones at (A) 0.5 kHz, (B) 1 kHz, (C) 2 kHz, (D) 4 kHz, and (E) 8 kHz with intensities ranging from -65 dB FS to -15 dB FS in steps of 10 dB FS. The x-axis shows both the frequency and the position-frequency location calculated from Equation (9). (F) The normalized nonlinear response growth of the system to the tones of 0.5, 1, 2, 4, and 8 kHz (squared-cosine rise and decay time of 10 ms) with intensities between -65 dB FS and -15 dB FS in steps of 10 dB FS.
The BM input/output (I/O) function, evaluating system nonlinearity and compression, is the ratio between RMS output at the CF channel (stimulus frequency) and stimulus RMS. Figure 11F shows I/O function curves for 100 ms pure tones (0.5, 1, 2, 4, 8 kHz; 10 ms rise/decay) at intensities from -65 dB FS to -15 dB FS (10 dB FS steps). I/O curves are normalized to the -65 dB FS I/O point. Output shows compressed intensity range (15 dB FS) compared to input (50 dB FS), with I/O curves more compressive at moderate CFs (1, 2, 4 kHz) than lower and higher CFs (0.5, 8 kHz).
Frequency Selectivity and Q Tuning of CAR-FAC
CAR-FAC frequency selectivity was assessed using system frequency responses calculated via FFT from system impulse responses at channels with CFs corresponding to 0.5, 1, 2, 4, and 8 kHz.
Quality factor (Q factor) tuning in CAR-FAC is achieved by adjusting the damping factor. To investigate Q tuning effects, different damping factors were used, and corresponding Q factors associated with ERB, *Q*ERB, were calculated:
QERB=CFERB (18)
ERB was evaluated from the system’s impulse response power spectral density (PSD).
Figures 12A-E show system frequency responses at output channels of CFs corresponding to 0.5, 1, 2, 4, and 8 kHz to -20 dB FS, 40 μs condensation clicks, with damping factors of 0.4, 0.5, and 0.7. Lower damping corresponds to higher gain across CFs. Figure 12F shows calculated *QERB for different damping factors. Lower QERB corresponds to higher damping; at higher damping (0.5 and 0.7), Q*ERB is higher at moderate CFs than lower and higher CFs.
Figure 12. (A–E) The CAR-FAC system response calculated at the CFs corresponding to 0.5, 1, 2, 4, and 8 kHz with three damping factors (0.4, 0.5, and 0.7) in Equation (11). The x-axis shows both the frequency and the BM location calculated from Equation (9). (F) The corresponding *Q*ERB at CFs corresponding to 1, 0.5, 2, 4, and 8 kHz estimated from the BM impulse response PSD at CFs.
The relationship between dB FS and Sound Pressure Level (dB SPL) depends on the damping set-point in the CAR-FAC model. Comparing peak gain at moderate frequencies (1, 2, 4 kHz) with biological cochlea frequency response (Figure 1), a 0.4 damping factor results in ~60 dB peak gain for -20 dB FS input, fitting the 30 dB SPL input intensity curve in Figure 1. Accordingly, at 0.5 damping, -20 dB FS corresponds to 60 dB SPL; at 0.7 damping, -20 dB FS corresponds to 70 dB SPL.
System impulse response characteristics and intensity dependence of *QERB factors were also investigated. Figure 13 (Left) shows CAR-FAC impulse responses at 1 kHz CF to -50 dB FS, -30 dB FS, and -10 dB FS clicks. Impulse response shape and amplitude vary, while zero-crossing timing remains consistent across stimulus levels. Figure 13 (Right) shows calculated QERB factor for clicks (-60 dB FS to -10 dB FS, 10 dB FS steps) at 1 kHz CF. Q*ERB factor decreases as stimulus intensity increases, indicating reduced frequency response sharpness at higher intensities.
Figure 13. System impulse responses at the 1 kHz CF channel to -50 dB FS, -30 dB FS, -10 dB FS clicks. The arrows mark the amplitude of clicks. The red dashed lines mark two consecutive impulse response zero-crossings (Left). 1 kHz *Q*ERB factors derived from impulse responses at relative intensities from -60 dB FS and -10 dB FS in steps of 10 dB FS (Right).
DIHC Model Output Analysis
To analyze DIHC characteristics, DIHC response to tones was measured. To ensure consistent stimulus amplitude for DIHC, the FAC function was disabled, utilizing CAR’s linear amplification. 0.5, 1, and 4 kHz tones were presented, and CAR output was measured at corresponding CF channels. Tone amplitudes were adjusted to ensure a 2.28 dB FS CAR output amplitude at each channel. Adjusted tones were then used as input to measure DIHC output for tones with consistent CAR output amplitude.
Figure 14 shows DIHC output in response to 100 ms tones (0.5, 1, 4 kHz; 10 ms squared-cosine rise/decay). DIHC effectively detects and amplifies signal onset. At lower frequencies (e.g., 0.5 kHz), DIHC output exhibits minimal DC offset and follows the input sinusoidal curve. As input frequency increases, DIHC shows higher offset and reduced gain.
Figure 14. DIHC output and CAR output in response to 100 ms tones of 0.5, 1, and 4 kHz at the channels of CFs corresponding to those tones.
Discussion
This paper details a fully digital implementation of the CAR-FAC cochlear model. Time-multiplexing and pipeline parallelizing techniques enabled a 70-channel real-time CAR-FAC system at 44.1 kHz on a Cyclone V FPGA board. System responses to pure tones and clicks were measured, and CAR-FAC nonlinear growth, excitation patterns, frequency selectivity, and impulse response were analyzed. CAR-FAC Q tuning effects through damping factor adjustment were also investigated, along with DIHC model responses to tones.
Table 2 compares this system to previous silicon cochleae regarding architecture, channel number, frequency range, input range, Q tuning, and power consumption. Power consumption was estimated using Altera’s PowerPlay tool. The CAR-FAC system demonstrates a wide input frequency and dynamic range, with a limited Q tuning range. While FPGA board power consumption is higher than analog silicon cochleae, this fully digital system offers stability, scalability, and ease of use. It also shows strong agreement with biological data and improved SNR, making it a valuable input hardware stage for advanced machine hearing tasks like sound localization, sound segregation, and speech recognition.
Table 2. Comparison with prior silicon cochleae.
Author Contributions
YX, RW, and AvS: conceptualized and designed the FPGA system; YX: data acquisition; YX, TH, RW, and AvS: data evaluation and discussion; YX: manuscript preparation. All authors participated in results discussion, manuscript review, and final approval.
Conflict of Interest Statement
The authors declare no commercial or financial conflicts of interest.
Acknowledgments
This research was supported by the Australian Research Council Grant DP140103001 and inspired by a project at the 2016 Telluride Neuromorphic Workshop. Support from the Altera University Program is gratefully acknowledged.
References
[List of references as in the original article]
Keywords: neuromorphic engineering, electronic cochlea, basilar membrane, inner hair cell, outer hair cell, automatic gain control, medial olivocochlear efferent, FPGAs
Citation: Xu Y, Thakur CS, Singh RK, Hamilton TJ, Wang RM and van Schaik A (2018) A FPGA Implementation of the CAR-FAC Cochlear Model. Front. Neurosci. 12:198. doi: 10.3389/fnins.2018.00198