Conclusion

Describes the conclusion for pitch detection on Arduino. Questions and problems that arose and lessons learned. Part of the project Arduino Pitch Detector.\(\)

I found that a sample rate of 9615 Hz and a window size of 200 samples combined with interpolation produce correct results for clarinet and piano from 155.6 Hz (Eb3) to 1568 Hz (G6).

On the clarinet, it misses the far lowest note (D3) and the very highest notes that only select musician can play (Ab6-B6) as shown in the visual. For clarinet, normalization only adds one low note to the range, while causing the frequency to be less accurate for high notes. As a result, I decided to not use normalization for clarinet.

For piano, I would use normalization because it adds an extra 7 low notes that would otherwise not be recognized. Things brings the piano range to 98 Hz (G2) to 1568 Hz (G6).

Segmentation works well at the cost of a slight delay which is noticeable but acceptable. Overall, I found that this project pushed the Arduino to its limits in both processing power and available memory.

To simplify things, I have undone the effect of transposing in processing the test results. The graph below illustrates this

own work
Clarinet, fs=9615Hz, N=200, with interpolation

Questions and problems that arose

The Arduino did not have enough memory to store the audio samples. Using fewer samples would cause low notes to be missed. Instead, I lowered the sampling rate so that there were fewer audio samples to store but caused the device to miss high notes. To improve the accuracy of these high notes, I used quadratic interpolation.

I minimized the delay by sampling the audio asynchronously and by limiting the autocorrelation algorithm to frequencies corresponding to the lowest and highest notes of the instrument.

To improve the range of notes recognized, I tried normalizing the autocorrelation for the zeroes introduced by the lag while shifting the waveform. For clarinet, normalization only added one low note while causing the frequency to be less accurate. As a result, I decided to not use normalization for clarinet. For piano on the other hand it would adds seven extra low notes.

Lessons learned

This was my first big project. It took many iterations to build something that I could be proud of. While some iterations made improvements, many were discouraging. However, reviewing these failures gave me valuable insights. I foremost learned to chose a project that I am truly passionate which made it easier to stick with. When problems arose, it helped to be patient.

In presenting my project, it worked best to start with a simple introduction and then answer questions. This allowed me to adjust to each person’s interests and background. I met judges who used the same algorithms as that I used in my project. Sharing insights with the judges at the Synopsis Silicon Valley and Technology Championship and the California State Science Fair was memorable.

In the future, I would make my project smaller and easier to handle. I would create an 1-inch round printed circuit board and would also use the Intel Curie SoC. This SoC can be programmed in a similar manner to the Arduino and includes a Bluetooth interface. This interface would allow me to eliminate wires. I also considered porting the code to Java to run on an Android phone, but I prefer to have a small device that can be clipped to the instrument.

Want to learn about other software projects? Refer to our other Embedded C projects

Note level segmentation and buffering

Note level segmentation and buffering for pitch detection on Arduino (segment.cpp, segmentbuf.cpp). Describes the algorithms used for note level segmentation. Part of the project Arduino Pitch Detector.\(\)

The segmentation algorithm determines the beginning and duration of each note. The algorithm is based on a rule-based model published by Monty in 2000. Other algorithms were published by Tenney in 1980 and Cambouropoulos in 1997. The target platform is small embedded systems around an 8-bit RISC-based μController, running at 16 MHz, with 2 kByte SRAM and 32 kByte Flash memory (Arduino UNO R3). The choice of target places limitations on CPU usage and memory footprint.

The algorithm takes two characteristics into consideration:

  1. Note pitch, related to the fundamental frequency (f0) of the signal.
    • The class Frequency determines the fundamental frequency using autocorrelation as described by Brown [Brown, 1990]. The pitch is given by a MIDI number m = 69 + 12 * log2(f/440). With a window size of 200 samples and a modest sample rate of 9,615 S/s, the implementation is able to detect frequencies between 98 Hz (note G2) to about 2,637 Hz (note E6).
    • Another option was to use a FFT and the correlation theorem, while this might be faster, is has a larger memory footprint.
  2. Energy envelope, related to the loudness of the signal.
    • The class Microphone determines the peak-to-peak amplitude of the signal for each window of samples. Using a window size of 200 samples at 9,615 S/s, this results in 48 envelope points per second. This same envelop may be used for beat detection. At a tempo of 160 beats/min, this corresponds to 18 samples/beat. Other methods considered were the Hilbert transform, or the sum of the root-mean-square amplitudes, both of which were found to computational intensive.

Based on these characteristics, the algorithm determines the beginning (onset) and the duration of the notes, by applying rules:

  1. The pitch is considered constant when the fundamental frequency stays within a piano key distance (+/- 3%). In other words, when it rounds to the same MIDI pitch.
  2. A note onset is recognized when the pitch lasts for a minimum note duration.
    • This eliminates errors concerning spurious notes. The advantage of this approach over using only the amplitude envelope, is that even in glissando or legato the onset is easily detected.
    • The disadvantage is that the onset is only recognized after this minimum note duration.
    • Another concern is that during the release. Because the fundamental frequency disappears first, followed by harmonics one after, the algorithm may erroneously recognize these harmonics as a new note.
    • A minimum note duration (MIN_SEGMENT_DURATION) of 40 msec, corresponds to about two sample windows.
  3. The note duration is the time from its onset to its termination. The termination is determined by either:
    1. The recognition of silence
      • Silence is recognized when the signal energy falls below the noise floor.
      • A noise floor (AUDIBLE_THRESHOLD) of 20 (15% of full scale) seems to work well with a automatic gain control microphone.
    2. The pitch changing
      • A pitch change is recognized when a different pitch remains constant for the minimum note duration. This implies that the system allows the pitch to have different values during the decaying part of a note.
    3. The energy increasing during the decay
      • Therefore, if the energy rises during the decay phase of the note, I can assume that another note with the same pitch has been played. A threshold is applied, so only significant increases in energy will cause a new note to be recognized. A threshold (SEGMENT_ENERGY_INCR_THRESHOLD) of 40% above the energy of the prior window has yielded good results.

The next part, is the Conclusion.

Frequency and pitch detection

Describes frequency and pitch detection for pitch detection on Arduino (frequency.cpp, pitch.cpp). The core of the application: the frequency detection algorithm. Part of the project Arduino Pitch Detector.\(\)

Each musical instruments create an unique set of harmonics [demo]. In the commonly used Even Tempered Scale, the A4 key on the piano corresponds a fundamental frequency \(f_0=440\mathrm{\ Hz}\). Other frequencies follow as:

$$ f=2^{\frac{n}{12}}\times 440\,\rm{Hz} $$ where \(n\) is the number of half-steps from middle A (A4).

Designs considered

The main challenge of this project is to detect the fundamental frequency of the notes played using an embedded system. The fundamental frequency is defined as the lowest frequency produced by an oscillation.

The following three methods were considered

  1. Using a time-domain feature such as zero crossings. This means that you find the distance between when the waveform goes from negative to positive the first time and when it does that a second time.
  2. Using autocorrelation to determine the frequency of instrumental sounds as published by Judith C. Brown and Bin Zhang [Brown, Monti]. Autocorrelation is a math tool for finding repeating patterns. It estimates the degree to which a signal and a time lagged version of itself are correlated. A high correlation indicates a periodicity in the signal at the corresponding time lag.
  3. An alternate method of calculating autocorrelation is by using a Fast Fourier Transform and approaching it similar to convolution. To get cross-correlation instead of convolution, I time-reverse one of the signals before doing the FFT, or take the complex conjugate of one of the signals after the FFT as shown in $$ R_{xx}(k) = \mathcal{F}^{-1}\left(\mathcal{F}(t)\times\mathcal{F}(t+k)^*\right) $$

A literature study revealed that using time-domain features (1) will not perform well for musical instruments, such as a clarinet, that produce harmonics that are stronger than the fundamental frequency.

Brown’s method (2) is more promising. It calculates the autocorrelation \(R_{xx}\) at lag \(k\) by the equation [wiki, Lyon]

$$ \begin{align} R_{xx}(k) & =\frac{1}{\sigma^2} \sum_{t=1}^N(s_t-\bar{s})(s_{t+k}-\bar{s})\\ \rm{where}\quad \bar{s}&=\frac{1}{N}\sum_{t=1}^Ns_t,\quad \sigma=\sqrt{\frac{1}{N}\sum_{t=1}^N(s_t-\bar{s})^2}\nonumber \end{align} $$

The symbols:

  • \(s\) are audio samples
  • \(N\) is the total number of samples
  • \(k\) is the lag
  • \(\bar{s}\) is the mean signal value
  • \(\sigma^2\) is a normalization factor.

However, calculating the autocorrelation requires \(2N\) subtractions, \(N\) additions, \(2N\) multiplications, and \(N\) divisions. This is likely to exceed the design constraints.

The alternate method of calculating autocorrelation (3) reduces the processing requirement to \(N-\log(N)\), but the algorithm uses significantly more memory. This leaves less memory to store audio samples thereby reducing the window size and consequently limits the ability to recognize low frequencies.

Once it determines the frequency, the MIDI pitch \(m\) follows as

$$ m = 69+12\log_2\frac{f}{440} $$

Design to find the frequency

To detect the fundamental frequency, I simplified Brown’s method by making two assumptions

  1. The signal has no DC bias, \(\bar{s}=0\).
  2. We’re only interested in the lag for which the autocorrelation peaks, not the absolute value of the autocorrelation. Therefore, the normalization factor \(\sigma^2\) that is independent of the lag \(k\) can be ignored.
  3. If the term \(t+k\) extends past the length of the series, the series is considered to be \(0\).

Based on these assumptions, the autocorrelation can be estimated as:

$$ R_{xx}(k) = \sum_{t=1}^Ns_t\,s_{t+k} $$

The figure below shows a visualization of the term . The original waveform is shown in blue, and the time lagged version in red. The black curve shows the multiplication of these signals.

own work; requires svg-enabled browser
The term s(t) s(t+k). for one value of k

The plot below shows an example of the estimated autocorrelation for \(R_{xx}(k)\) as a function of the lag \(k\). By definition the maximum autocorrelation \(R_{xx}(0)\) is at lag \(k=0\).

I ported my frequency detection code to GNU Octave to enhance my visual understanding of the algorithm. This was especially helpful in determining the empirical threshold for the peak finding algorithm.

A peak finding algorithm looks for the first peak that exceeds a threshold at \(\frac{2}{3}R_{xx}(0)\). The corresponding lag \(k_0\) is considered the period time \(T_0\). The fundamental frequency \(f_0\) follows as the inverse of \(T_0\).

own work; requires svg-enabled browser
Pitch Rxx

The listing below shows a code fragment from frequency.cpp that implements the autocorrelation function.

Autocorrelation function

Design to find the peak

By definition the autocorrelation is maximum at lag \(k=0\). If we find the maximum value for \(R_{xx}(k)\), for \(0 \lt k \lt N\), then \(k\) is the period time. This requires calculating \(R_{xx}(k)\) for all values of \(k\).

To make it faster, I accept the first maximum that is above \(\frac{2}{3}R_{xx}(0)\). The first peak that exceeds this value is considered the period time \(T_0\). The algorithm is shown below.

Peak finding algorighm

Simulation

A complementary simulation in GNU Octave visualizes the algorithm, making the process easier to understand and fine tune. The video below shows the calculation of \(R_{xx}(k)\), and the peak finding algorithm. To run the simulation yourself, load the file simulation/file2pitch.m. in GNU Octave or Matlab.

Analyzing accuracy

Analysis revealed that the sample rate and window size determine the maximum and minimum frequency that can be recognized. These variables can be configured in config.h.

  1. If the sample rate is to low, high frequencies will only have a few audio samples per period, causing these frequencies not to be accurately recognized.
  2. The window size is the number of audio samples that are processed at the time in the autocorrelation loop. If the windows size is too small, low frequencies cannot be recognized.
  3. The delay is caused by the sampling of the input, calculating the frequency and the segmentation algorithm. The highest delay occurs at the lowest frequency, approximately 60 milliseconds. This was noticeable but acceptable. I observed that my simple synthesizer software introduced a noticeable additional delay. The delay was minimized by sampling audio while doing the calculations, and by stopping the autocorrelation as soon as the frequency could be determined.

Range

The project’s aim is to recognize the range of notes produced by a B♭ clarinet. This clarinet can produce notes from E♭3 to G6, corresponding to a fundamental frequencies \(f_L\) and \(f_H\) $$ \shaded{ \left\{ \begin{align} f_L &= 155.6 \, \rm{Hz} \nonumber \\ f_H &= 1.568 \, \rm{kHz} \nonumber \end{align} \right. } $$

For the 12-note-per-octave equal-tempered scale, each note or semi-tone is \(5.946309436\%\) “higher” in frequency than the previous note. Given that the frequency will be rounded to a note pitch, we can allow for an error rate \(\varepsilon\) $$ \varepsilon \approx 0.05946\% $$

The highest frequency \(f_H\), determines the sample rate. To stay within the error rate \(\varepsilon\), the sample rate \(f’_{s}\) follows as $$ f_s’ = \frac{f_H}{\varepsilon} = \frac{1568}{0.05946} \approx 26.37\,\rm{kHz} $$

The Arduino can only sample a signal at \(2^a\,\frac{16\,\rm{MHz}}{128\times 13}\), where \(a\in \mathbb{N}\). As a consequence, the sampling rate has to be rounded up to \(f_{s}^{\prime\prime}\) $$ f_s^{\prime\prime} = 38.461\,\rm{kHz} $$

The lowest frequency \(f_L\) determines the window size, where the number of audio samples \(N^\prime\) should cover at least twice period time of the lowest frequency \(f_L\) $$ N^\prime = 2\,\frac{f_s^{\prime\prime}}{f_L} = 2\,\frac{38461}{155.6} \approx 495 $$

Each audio sample requires 1 byte of the Arduino’s SDRAM. With only about \(200\) bytes left available to store audio samples \(N’\) will not fit.

Alternative

Instead, we use the available \(200\) bytes to store the samples, so the window size \(N\) is $$ \shaded{ N = 200 } $$

In order to recognize the lowest frequency \(f_L\), the sample frequency \(f_s^{””}\) follows as $$ \begin{align} f_s^{”’} &\le f_L\,\frac{N}{2} \nonumber \\ &\le 155.6\,\frac{200}{2} \nonumber \\ &\le 15.560\,\rm{Hz} \end{align} $$

For the Arduino sample rate scaler, this needs to be rounded down to \(f_s\) $$ \shaded{ f_s = 9.615\,\rm{kHz} } $$

The resulting frequency range can be expressed as $$ \begin{align} \frac{f_s}{N/2} \lt &f \lt \Delta\varepsilon\,f_s \nonumber \\ \frac{9615}{200/2} \lt &f \lt 0.0595\times 9615 \nonumber \\ 96.2\,\rm{Hz} \lt &f \lt 572\,\rm{Hz} \end{align} $$

This implies that it will only reach D♭5. Let’s see how we can improve the accuracy.

Measurements

Low notes are meased accurately, but errors increase with frequency.

B♭ Clarinet pianissimo, N=200, S=9615, threshold=67%

Improvements

With the base algorithm in place, time has come to focus on improvements.

Improving speed

ms word clipartThe fundamental frequency requires calculating \(R_{xx}\) for all values of \(k\) . However, the possible values of \(k\) are limited by the window size and sample frequency.

  • The window size limits the lowest frequency, while
  • the sample frequency limits the highest frequency.

The range for the lag \(k\) is determined by the highest frequency \(f_H\) and the windows size \(N\) $$ \begin{align} \frac{f_s}{f_H} \leq &k \leq \frac{N}{2} \nonumber \\[0.5em] \implies \frac{9615}{1568} \leq &k \leq \frac{200}{2} \end{align} $$

Rounding down the minimum and rounding up the maximum values, the range for the lag \(k\) follows as $$ \shaded{ 6 \leq k \leq 62 } $$

Improving accuracy of high notes

ms word clipart We can improve the accuracy of especially the high notes, by using interpolation. Fitting a parabolic curve to the prior, current and next autocorrelation values \(k_1,k_2,k_3\). The value for \(k\) that corresponds to the top of the parabola, is the estimate lag \(k_m\).

own work; requires svg-enabled browser
Interpolation

The difference between the estimated autocorrelation value \(k_m\) and the measured factor \(k_2\) is the correction factor \(\delta\). [^1][^2] [^1]: [Polynomial Interpolation, Abdulkadir Hassen [^2]: Cross-Correlation, Douglas Lyon] $$ \delta = \frac{k_3-k_1}{2(2k_2-k_1-k_3)} $$

The corresponding sample window size \(N’\), is determined by the lowest frequency \(F_L\) $$ N’ = 2\frac{f_{s}’}{f_L}=2\,\frac{9615}{155.6}\approx 125 $$

I rounded the window size \(N”\) up to 200 bytes. $$ \shaded{ N = 200 } $$

Measurements

The accuracy of high note dramatically improves as shown below.

B♭ Clarinet pianissimo, N=200, S=9615, interpolation, threshold=67%

Improving accuracy of low notes

ms word clipart As the peak finding algoritm considers higher values of the lag \(k\), the autocorrelation values decrease because of the zeroes introduced in the shifted signal.

I tried normalizing the autocorrelation for these introduced zeroes, by multiplying with a normalization factor of \(\frac{N}{N-k}\). As a result the normalized autocorrelation can be expressed as $$ R_{xx}(k)=\frac{N}{N-k}\sum_{t=1}^Ns_t\,s_{t+k} $$

Measurements

For the clarinet the nomalization makes it drop some high notes as shown in the figure below. The clarinet doesn’t benefit from improving accuracy on low notes, because the lowest notes it can play is \(155.6\,\rm{Hz}\) compared to the Arduino that can detect up to \(96\,\rm{Hz}\).

B♭ Clarinet pianissimo, N=200, S=9615, interpolation, normalization, threshold=80%
Piano

The results for piano samples using interpolation are shown below for reference.

Piano mezzo-forte, N=200, S=9615, interpolation, threshold=80%

For piano it greatly benefits the low notes as shown below.

Piano mezzo-forte, N=200, S=9615, interpolation, normalization, threshold=80%

Continue reading on the next page to learn about note level segmentation.

Digitizing the analog signal

Describes digitizing the analog signal for pitch detection on Arduino. How the analog signal from the microphone is digitized. Part of the project Arduino Pitch Detector.\(\)

Digitizing the analog signal (microphone.cpp)

own work The microphone driver determines the value presented at the analog port, using analog-to-digital (ADC) conversion. The driver reads the analog port asynchronously. In this asynchronous approach, the CPU starts the conversion, and moves on to do other things. Once, the a conversion is complete, the ACD converter interrupts the CPU. The CPU postpones what it is doing, reads the conversion result, and returns to whatever it was doing. This way the the CPU doesn’t have to wait for the ADC conversion to complete (typically abt. 832 cycles) [Meettechniek, 2013].

This asynchronous approach, usually requires two buffers. One buffer is written to asynchronously, while the another buffer is being processed. Given that Arduino UNO has a precious 2048 bytes of SDRAM, this code goes through some hoops, so that it only needs one buffer.

The typical sequence of events is:

  1. The application needs audio samples and calls Microphone::getSamples().
    • The first time getSamples() is called, it allocates memory for the samples, and starts the sampling. All other times, the samples are probably already available (started another time in step 3).
    • The method getSamples() waits until all samples are available.
  2. The application processes the samples.
  3. Once the application is done with the samples, it calls Microphone::update().
    • This initiates the interrupt driven process of gathering new samples. It does not wait for the samples.
  4. The application continues (without accessing the samples) e.g. to display the results.
  5. Return to step 1.

Once the application, determines the frequency, it starts to take new samples while determining the pitch, determining note level segmentation, and displaying the results. This application uses a sample rate of 9615 samples/s. Refer to autocorr.cpp for details.

The next page describes one of the key algorithms: finding the frequency and pitch.

Hardware

Describes the hardware for pitch detection on Arduino. Part of the project Arduino Pitch Detector.\(\)

The Arduino receives input through a microphone, displays results on display and outputs the MIDI commands through USB Serial on the Arduino itself.

Schematic

This project uses input from amplified microphone and outputs to a TFT display and USB-midi connection. It reuses the USB connector by replacing the firmware on the ATmega16U2 companion chip as described on the page Sending MIDI Events.

Power

Logic

A few notes:

  • When lit, LED1 indicates that the signal exceeds the maximum level.
  • When connected, JP1 selects USB-MIDI output. Otherwise USB-SERIAL is selected. To upload the sketch, this jumper needs to be open, and the Arduino power cycled.
  • Push button, SW1 was used during development to replay stored MIDI notes.
  • Switch SW2, is for a future extension that corrects the pitch for transposing instruments.

Remember to connect the 3.3 Volt output from the Arduino to the AREF input on the Arduino. If you forget this, no notes will be displayed.

Bill of Materials

The price comes down to under $40 based on single quantity, excluding shipping and taxes. However, note that some items, such as the PCB, have minimum order quantities.

Name Description Suggested mfr and part# Paid
PCB1 Electret microphone w/ auto gain control Adafruit 1713 $7.95
PCB2 Arduino METRO 328, or Arduino Uno R3 Adafruit 2488 $17.50
PCB3 ST7735R 1.8″ Color TFT display w/ MicroSD shield Adafruit 802 $34.95
HDR Shield stacking headers for Arduino Adafruit 85 $1.95
LED1 LED, Amber Clear 602 nm, 1206 Lite-On LTST-C150AKT $0.33
R1 Resistor, 330 Ω, 1/8 W, 0805 YAGEO RC0805FR-0768RL $0.10
JP1 Connector header, vertical 3 pos, 2.54mm Metz Connect PR20203VBNN $0.10
SW1 Switch tactile, SPST-NO, 0.05A/24V TE Connectivity 1825910-3 $0.15
SW2 Switch rotary dip BDC comp, 100mA/5V Nidec Copal SH-7030TB $2.00

Considerations

Microphone

For the microphone, I use the Adafruit microphone breakout, because it has a 1.25 Volt DC bias and includes an automatic gain control. The “max gain” is set to 40 dB by connecting the GAIN to 5V. Other microphones will work for as long as they have a DC biased output, and the output signal is strong enough.

Arduino

The popular Arduino UNO R3 forms the heart of the system.

If you’re going to reprogramming the Atmega16u2, you need access the companion chip header (ICSP1) as marked in the illustration below.

Arduino pins

Display

For the display, I chose an 1.8″ TFT LCD screen. I went back and forth between using the Adafruit breakout and Shield. The advantage of this particular LCD screen is that it comes with a library and includes a μSD card reader. The module connects to the Arduino using the SPI interface. More details about SPI can be found in the article Math Talk.

Replay Push push button

Occasionally, I use a push button to replay stored MIDI notes. The push button is active low. To use this, you need to enable USB_MIDI in the config.h file.

USB-midi switch

When this switch is closed during power-up, the companion chip functions as a UART/USB-MIDI bridge. Otherwise, it does its usual UART/USB-SERIAL conversion. Refer to MIDI events for details.

The next page of this article describes the signal path and introduce the software modules.

Continue reading to learn about the Signal path and software modules.

Introduction

By our entrepreneur in residence, Johan Vonk

Describes a device that uses pitch detection built on Arduino. It recognizes the notes played on a musical instrument such as a clarinet. It won 1st place in the Silicon Valley science competition.

Introduction

While playing my clarinet, I realized that it would be fun to hear other instruments playing alongside me. Instruments like guitar, piano or even a choir. It would also be nice if the melodies could be transcribed on paper. All existing solutions to these problems require a bulky computer or a cell phone. I realized that creating this compact device would combine my interest for music with my passion for engineering and math.

This project creates a small, affordable and accurate device that listens to a musical instrument and recognizes the notes played. These notes can then be sent to a synthesizer in the common MIDI format. This allows musician to hear other instruments playing alongside with them, and allows them store their compositions.\(\)

The implementation is in C++ and uses an Arduino UNO, breadboard, microphone and optional display. It displays the music as a piano roll and sends it to an external synthesizer.

Those that want to jump straight to the core, read at least the hardware page and download the code through

Background

Five years ago, I asked my dad “How do computer do math?”. Ever since, we have spent countless hours learning about the semiconductor physics, diode logic, programmable logic and microprocessors. I find it fascinating to learn about physics and engineering. With the help and dedication of my father, I then started programming. He has provided valuable insights and suggestions allowing me grow. While it can be frustrating at times, I also find it very enlightening to build programs with ever increasing complexity.

For this project, I conducted the research and development at home. As always, my father supervised the project and helped me with architecture and code reviews to keep it readable and maintainable. He also provided suggestions to organize the code to make modules reusable and testable. In particular, he helped me when the software appeared to have random crashes. He explained about heap and stack pointers, and suggested to test for at least a 50 bytes headroom after the data structures.

Goal, design criteria and constraints

The project creates a small embedded monophonic music transcription system. Monophonic means one sound, so you can transcribe one note at a time. Therefore you cannot use this device to analyze what a band is playing without each instrument playing separately.

The device samples an analog audio signal, processes it and creates a digital MIDI output. MIDI is a digital annotation for music that signals the beginning and ending of each note.

I set out to adhere to the following design criteria:

  1. It should detect the correct pitch for notes produced by a B♭ clarinet but also work for other monophonic instruments.
  2. The beginning and duration of each notes should be identified, regardless of it being followed by a rest, note change or note repetition. This process is called segmentation.
  3. There should be no noticeable delay between the incoming audio signal and the digital MIDI output.

The cost of the initial prototype, excluding the optional display, should be under $20 and fit on a credit card size PCB. To address these constraints, I chose to build the device around the commonly available Arduino UNO, an open source prototyping platform based on an 8-bit 16 MHz microcontroller with only 2 Kbyte SRAM. Using such a slow microcontroller with limited memory posed interesting challenges but also keeps the cost down. By creating a custom PCB, the final product can be approximately 1 inch in diameter what allows it to be clipped onto an instrument.

Methodology

The common method of detecting a note pitch is by using a fast Fourier transform (FFT). This project shows that similar results can be achieved using autocorrelation. This method has the advantage of requiring only half the amount of memory compared to a FFT. The algorithm incorporates optimizations such using a low sample rate, simplifying the autocorrelation formula, using interpolation and asynchronously sampling the audio signal.

Testing with public audio samples, I found that all clarinet notes from 155.6 Hz (Eb3) to 1568 Hz (G6) are recognized correctly. Using normalization, this range can be extended to 98 Hz (G2) to 1568 Hz (G6) for the wider range of the piano. This small device enhances the experience of playing an instrument.

The source code is shared through GitHub what allows others to learn and improve this product.

Following chapters describe my project in detail. From the hardware schematic to verification methods. I hope you enjoy learning from my experience.

Continue reading to learn about the Hardware used.

Copyright © 1996-2022 Coert Vonk, All Rights Reserved