This is the fifth and main part of the article about the Arduino Pitch Detector. I get to the core of the application: the frequency detection algorithm.\(\)

## Frequency and pitch detection (`frequency.cpp`

, `pitch.cpp`

)

Each musical instruments create an unique set of harmonics [demo]. In the commonly used Even Tempered Scale, the *A4* key on the piano corresponds a fundamental frequency \(f_0=440\mathrm{\ Hz}\). Other frequencies follow as:

$$f=2^{\frac{n}{12}}\times440\mathrm{\ Hz}$$

where \(n\) is the number of half-steps from middle A (A4).

### Designs considered

The main challenge of this project is to detect the fundamental frequency of the notes played using an embedded system. The fundamental frequency is defined as the lowest frequency produced by an oscillation.

The following three methods were considered

- Using a time-domain feature such as zero crossings. This means that you find the distance between when the waveform goes from negative to positive the first time and when it does that a second time.
- Using autocorrelation to determine the frequency of instrumental sounds as published by Judith C. Brown and Bin Zhang [Brown, Monti]. Autocorrelation is a math tool for finding repeating patterns. It estimates the degree to which a signal and a time lagged version of itself are correlated. A high correlation indicates a periodicity in the signal at the corresponding time lag.
- An alternate method of calculating autocorrelation is by using a Fast Fourier Transform and approaching it similar to convolution. To get cross-correlation instead of convolution, I time-reverse one of the signals before doing the FFT, or take the complex conjugate of one of the signals after the FFT as shown in \(R_{xx}(k)=\mathcal{F}^{-1}\left(\mathcal{F}(t)\times\mathcal{F}(t+k)^*\right)\).

A literature study revealed that using time-domain features (1) will not perform well for musical instruments, such as a clarinet, that produce harmonics that are stronger than the fundamental frequency.

Brown’s method (2) is promising. It calculates the autocorrelation \(R_{xx}\) at lag \(k\) by the equation [wiki, Lyon]

$$\begin{array}{lcc}

R_{xx}(k)=\frac{\sum_{t=1}^N(s_t-\bar{s})(s_{t+k}-\bar{s})}{\sigma^2},&

\bar{s}=\frac{1}{N}\sum_{t=1}^Ns_t,&

\sigma=\sqrt{\frac{1}{N}\sum_{t=1}^N(s_t-\bar{s})^2}

\end{array}$$

where \(s\) are audio samples, \(N\) is the total number of samples, and \(k\) is the lag, \(\bar{s}\) is the mean signal value, and \(\sigma^2\) is a normalization factor. However, calculating the autocorrelation requires \(2N\) subtractions, \(N\) additions, \(2N\) multiplications, and \(N\) divisions. This is likely to exceed the design constraints. The image on the right shows the autocorrelation process for samples of a clarinet playing the note C4.

The alternate method of calculating autocorrelation (3) reduces the processing requirement to \(N-\log(N)\), but the algorithm uses significantly more memory. This leaves less memory to store audio samples thereby reducing the window size and consequently limits the ability to recognize low frequencies.

The MIDI pitch \(m\) then follows as

$$m=69+12\log_2\frac{f}{440}$$

### Design to find the frequency

To detect the fundamental frequency, I simplified Brown’s method by making two assumptions

- The signal has no DC bias \(\bar{s}=0\).
- We’re only interested in the lag for which the autocorrelation peaks, not the absolute value of the autocorrelation. Therefore, the normalization factor σ^2 that is independent of the lag k can be ignored.
- When the term \(t+k\) extends past the length of the series, the series is considered to be \(0\).

Based on these assumptions, the autocorrelation can be estimated as:

$$R{xx}(k)=\sum_{t=1}^Ns_t\,s_{t+k}$$

The figure below shows a visualization of the term . The original waveform is shown in blue, and the time lagged version in red. The black curve shows the multiplication of these signals.

The plot below shows an example of the estimated autocorrelation for \(R_{xx}(k)\) as a function of the lag \(k\). By definition the maximum autocorrelation is at lag \(k=0\). A peak finding algorithm looks for the first peak that exceeds a threshold at \(\frac{2}{3}\) of this value. The corresponding lag \(k\) is considered the period time \(T_0\). The fundamental frequency \(f_0\) is simply the inverse of \(T_0\).

The listing below shows a code fragment from `frequency.cpp`

that implements the autocorrelation function.

### Design to find the peak

By definition the autocorrelation is maximum at lag *k=0*. If I find the maximum value for *R _{xx}(k)* where

*0≤k<N*, then that

*k*is the period time. The fundamental frequency is the inverse of

*T*. However, this requires calculating

_{0}*R*for all values of

_{xx}(k)*k*.

To make it faster and less compute intensive, I chose to accept the first maximum that is above *2/3×R _{xx}(0)*. The first peak that exceeds this value is considered the period time (

*T*). This algorithm is implemented in

_{0}`frequency.cpp`

` `

as a peak detector as shown in the code fragment below.frequency_t const // returns frequency found, 0 when not found [out] Frequency::calculate( samples_t const samples ) // pointer to signed 8-bit data samples [in] { float period = 0; if ( samples ) { samplesLag_t const lagMin = Config::LAG_MIN; // SAMPLE_LAG_MIN; samplesLag_t const lagMax = Config::LAG_MAX; autoCorr_t const acMax = _autoCorr( samples, 0 ); // initial peak = measure of the energy in the signal autoCorr_t const acThreshold = acMax * 2 / 3; // empirical value autoCorr_t acPrev = 0; State state = State::findPosSlope; // ensure C++11 is enabled for ( samplesLag_t lag = lagMin; (lag &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt; lagMax) &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (state != State::secondPeak); lag++ ) { autoCorr_t const ac = _autoCorr( samples, lag ); // unnormalized autocorrelation for time &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot;lag&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;quot; switch ( state ) { // find peak after the initial maximum case State::findPosSlope: if ( (ac &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; acThreshold) &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; (ac &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; acPrev) ) { state = State::findNegSlope; } break; case State::findNegSlope: if ( ac &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;= acPrev ) { state = State::secondPeak; period = lag - 1 + _quadInterpAdj( _autoCorr( samples, lag - 2 ), acPrev, ac ); } break; } acPrev = ac; } } return (period &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0 &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; f &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt; Config::FREQ_MAX) ? Config::SAMPLE_RATE / period : 0; }

A complementary simulation in GNU Octave visualizes the algorithm, making the process easier to understand and fine tune. The video below shows the calculation of *R _{xx}(k)*, and the peak finding algorithm. To run the simulation yourself, load the file

`simulation/file2pitch.m`

. in GNU Octave or Matlab.### Analyzing accuracy

To analyze the accuracy from the device, I used a spreadsheet. Testing and analysis revealed that the sample rate and window size determine the maximum and minimum frequency that can be recognized. These variables can be configured in `config.h`

.

- If the
**sample rate**is to low, high frequencies will only have a few audio samples per period, causing these frequencies not to be accurately recognized. Given that the frequency will be rounded to a note pitch, we can allow for an error rate of ±2½%. - The
**window size**is the number of audio samples that are processed at the time in the autocorrelation loop. If the windows size is too small, low frequencies cannot be recognized. The number of audio samples should be at least twice the maximum period time for autocorrelation to work. - The
**delay**is caused by the sampling of the input, calculating the frequency and the segmentation algorithm. The highest delay occurs at the lowest frequency, approximately 60 milliseconds. This was noticeable but acceptable. I observed that my simple synthesizer software introduced a noticeable additional delay. The delay was minimized by sampling audio while doing the calculations, and by stopping the autocorrelation as soon as the frequency could be determined.

The project’s aim is to recognize the range of notes produced by a B♭ clarinet. The B♭ clarinet is a transposing instrument. When playing a C note, it will sound like a B♭. This clarinet can produce notes from E♭3 to G6, corresponding to a fundamental frequency from f_{1}=155.6 to f_{2}=1568 Hz.

To stay within the error rate of \(\pm2.5%\), the sample rate \(f_{s1}\) follows as

$$f_{s1}=\frac{f_2}{2.5%}=31,360\mathrm{\ Hz}$$

The Arduino can only sample an analog signal at multiples of \(16\mathrm{\ MHz}/128/13=9,615\mathrm{\ kHz}\) , therefore I need to round the sampling rate up to \{f_{s2}\)

$$f_{s2}=38,461\mathrm{\ Hz}$$

The corresponding sample window size \(N_2\) follows

$$N_2=2\frac{f_{s2}}{f_1}=2\frac{38,461}{155.6}\approx495$$

This well exceeds the 200 bytes of SDRAM memory that are available on the Arduino to store audio samples.

### Improving accuracy

I ported my frequency detection code to GNU Octave to enhance my visual understanding of the algorithm. I found this especially helpful in determining the empirical threshold for the peak finding algorithm.

After the normal **peak finding algorithm** determines the maximum auto correlation value \(k_2\). It then fits a parabola through the autocorrelation value itself, the one before (\(k_1\)) and the one after (\(k_3\)). It determines the lag that corresponds to the maximum value of the parabola.

Fitting the data points to a parabola, I can find the correction factor \(\delta\) as described in [Hassen, Lyon]

$$\delta=\frac{k_3-k_{????}}{2(2k_2-k_1-k_3)}$$

The corresponding sample window size \(N_2\) becomes

$$N_2=2\frac{f_{s3}}{f_1}=2f_{s3}\frac{9,615}{155.6}\approx125$$

I rounded the window size \(N_3\) up to all the available 200 bytes

$$N_3=200$$

### Improving speed

The fundamental frequency requires calculating \(R_{xx}\) for all values of \(k\) . However, the possible values of \(k\) are limited by the window size and sample frequency. The window size limits the lowest frequency, while the sample frequency limits the highest frequency. I empirically determined the minimum lag at \(k=6\). Therefore, the speed can be improved by only looking for between the corresponding time lag \(k\).

$$6<k<\frac{N_3}{2}$$

where \(N_3\) is the window size. This corresponds to a frequency range from 96 to 1568 Hz as in

$$\frac{2f{s3}}{N_3}<f<\frac{f_{s3}}{6.132}$$

However, the frequency range measured was E♭3 (155.5 Hz) to G6 (1568 Hz).

### Improving range

I tried normalizing the autocorrelation for the zeroes introduced by the lag. To do this, I multiplied by a correction factor of . The normalized autocorrelation formula is

$$R{xx}(k)=\frac{N}{N-k}\sum_{t=1}^Ns_t\,s_{t+k}$$

This extends the range for piano samples from G2 (98 Hz) to G♭6 (1480 Hz) as shown in the Figure below.

It however also misses a high note for the clarinet as shown in the Figure below.

You can still get that high clarinet note, if you lower the threshold from 80% to 71% as shown in the Figure below.

However, it then misses a low note for piano as shown in the Figure below.

For clarinet, normalization only adds one low note to the range, while causing the frequency to be less accurate. As a result, I decided to not use normalization for clarinet. The results for this are shown in the conclusion.

The next page of this article describes the note level segmentation.

Hello,

I want to know the frequency of the power grid, but harmonic frequencies in the power grid are a huge problem.

Can you send me a code for ‘filtering’ fundamental? (0-150Hz max)

Thanks!

great works !!!

Hello,

First of all, this a great work ! Not only just a sample stuff but very documented, you shoudl be proud !

I’d like to use it for a guitare. Do you think it will work ? Did you test it ?

Finally, did you make a video of it to see how good it works ? :)

Thanks for the sharing and keep having fun making all these things :)

Hello. Bravo for this work …

I am a retired electronic engineer, and I designed a tuner working with FFT on a PC with a professional tool : Labview from National Instrument, which offers huge signal processing libraries. I wanted a very precise instrument to tune the reeds of a vintage italian accordion (Paolo Soprani, 1915). I am also playing bassoon, and then concerned with bass notes. I encountered many difficulties with low frequency cut off of the microphones, giving signals with harmonics far higher than fundamental (up to 20 dB). The lowest note of my accordion is a Bb at 58.27 Hz, on the left hand chords. This is also the lowest note of the basson. Precise measurement need very long sampling of several seconds !

Well, now, I want to buid a little funny gadget which will animate a “snake” poping out of a basket in relation with the recognised notes … I am very interested by your work, but I think i will use a more powerfull processor than the Arduino uno, and then higher sampling frequency and 500 or 1000 samples.