## Talk to your old CD player

Have you ever wondered why you should buy new devices just because the old ones don’t support IoT? Any device controlled by an infrared remote can be transferred into an IoT device with a Adafruit ESP8266, an infrared receiver and LED, and your remote control. With the help of an IR code library, I was able to make a program that does just that.

Triggered with the phrase “Hey Google”, the Google Assistant is able to feed the words following the trigger phrase into its database of applets, one of which is IFTTT. Once one of the applets recognizes the phrase, it responds with a statement showing that it has acknowledged the command. In this case the answer phrase is set by the user. The Google Assistant will respond by saying the term set in IFTTT. I set it to “IR Code Sent.” The conversation will be something like the one shown in the conversation above.

## Problem analysis

Our goal is to teach the Google Assistant to control a device with an IR signal. In this case we use a Panasonic SC-HC20 CD player.

We need a device, controlled by Google Assistant that can mimic an infrared remote control. In this project we will try to build that device and hook it up to Google Assistant using IFTTT (If This Then That). We build this so people do not have to buy new devices just so they can be controlled by “smart” devices.

## Design considerations

We will give an overview of what we found and explain some of the trade-offs that we made.

Adding Skills to the Google Assistant can be challenging, but IFTTT simplifies the process by setting up the api.ai (Google’s Voice Recognition Platform) part of the equation. This was normally the tricky part of teaching the Google Assistant, and IFTTT makes this much simpler. Now all we need to do is configure IFTTT and code the transmitting device.

• If This Then That (IFTTT) adds custom voice commands to Google Assistant and can sent information to a device using HTTP

### Infrared Interface platform

We need to make a device that can accept commands over WiFi and relay them as infrared commands to the CD player.

• Requirements: small, low power, WiFi, GPIO (general purpose i/o pins)
• Contestants: Arduino, ESP8266, Arduino 101, ESP32
• Winner: Adafruit Feather Huzzah ESP8266. 1 MByte flash enough for over the air updates. Micro USB for code upload / debugging.Y

### Infrared Interface infrared codes

Our device needs to be able to mimic the remote as to control the CD player.

• Requirements: must support the platform; it ideally should recognize the device codes so we don’t have to use raw IR timings.
• Contestants: A fork from Chris Young’s IRlib;
• Winner:A fork from Chris Young’s IRlib;

### Traversing the WiFi router

The access router in our home is a jack-of-all-trades. Among its many duties it protects the home network from outside intruders. This comes with a catch: Google Assistant needs access to our home network to control the CD player. This means we need to provide a mechanism in which Google Assistant can contact the CD device.

Like with anything else on the internet, there are many choices to consider. The two trains of thought are:

• Make an exception in the firewall to accept and forward incoming commands to the IR interface. We can also use this router to handle the more resource intense encryption and scrutinize the commands to prevent code injection. Problem is that anything on the internet can access our webserver, so we need an authentication mechanism. There are two protocols that come to mind: HTTP and MQTT. HTTP is the protocol that brings the web pages that you visit to you. We can use it to push messages from Google Assistant to our IR interface. The other protocol, MQTT, is designed to provide low latency, assured messaging over fragile networks and efficient distribution to one or many receivers. It is less known but powers Facebook, and is supported by AWS IoT.
• Maintain a connection to a web service, so that it can send messages to the IR interface over that existing connection. When using the HTTP protocol, we need a kludges like long polling or websocket. A more elegant solution is Firebase Cloud messaging, but that requires a substantial amount of memory to handle HTTPS and identifiers. It appears to run on ESP8266. The memory is used for HTTPS, UID and authToken. Problem is that it causes a lot of idle traffic and requires an intermediate server (because IFTTT doesn’t support it directly).

IoT devices are characterized by a small processor, little memory and low power usage. In this respect MQTT would be ideal. However, this requires a bridge such as Ponte or web service to translate between HTTP and MQTT. Instead, we decided to go with the more common HTTPS protocol.

## Design, development and unit testing

The design consists of various blocks as illustrated below. We will describe the design, development and the unit testing for these blocks. We will start the Infrared Interface and work our way up to integrating it with the Google Assistant ecosystem.

All code is available through the repository:

### Infrared Interface

We go straight to the exciting part: the infrared interface.

#### Design

The hardware for this project is straightforward. The microcontroller is an ESP8266-based Feather HUZZAH that already has an USB interface for programming and debugging.

The software uses a framework that provides: online WiFi SSID/passwd configuration, saves fatal exception details to non-volatile memory and supports over-the-air (OTA) software updates.

#### Development

The ESP8266 is connected to an Infrared Receiver to decoding signals from the original remote control of the CD player. To send infrared codes to the CD player it uses a transistor to drive an IR LED.

##### Hardware

Build the circuit as shown in the schematic above.

1. Gather the parts shown in the table below. Be careful to use the PN2222A or 2N222A transistor (not the P2N2222A).
Label Part# Description Information We paid
IR1 Vishay TSOP38238 Infrared Receiver 38kHz datasheet $1.08 D1 Everlight EL-SIR333-A Infrared Diode 875nm 1.5Vdrop datasheet$ 0.39
T1 PN2222Atfr NPN Transistor datasheet $0.49 PCB1 Feather HUZZAH ESP8266 Microcontroller board detail$ 16.95
R1 1kΩ ¼W Resistor $0.02 R2 68Ω ¼W Resistor$ 0.02
• What’s another way to say it? = remote $ • What do you want the Assistant to say in response? = Sending IR Code • Press Create Trigger 2. Specify the Action by clicking the +that. 1. Choose action service = Select “Webhooks” and press Connect 2. Choose action = Make a web request 3. Fill in the action fields • URL = https://ir.home.domain.com/ir?req={{TextField}} • Method = GET • Content Type = text/plain • Press Create Action 4. Press Finish ## Testing Now for the fun part: 1. Prepare • While still on the IFTTT site at the My Applets > Google Assistant, click Check now. • Connect to the serial port of the ESP8266 using the Serial Monitor on the Arduino IDE 2. Give it a shot • Use a Google Home (or on another Assistant enabled device), and say “Hey Google CD Play”. • The Google Home should reply with “Sending IR Code”. • On the Serial Monitor, you should see “Sending Play”. • The CD player should start. • Give yourself a pat on the back and continue with testing the other keywords. I hope this device will help you modernize your home without spending hundreds on replacement devices. To further modernize your home you can build your own smart alarm clock that syncs alarms and events from Google Calendar ESP8266 reads Google Calendar ## Remote control We use the IRLib library to decode the signal from the infrared detector. This library supports many of the common TV remote controls. Use the included example sketch IRecvDump to see if your remote is supported. On the serial monitor you should see something like “Decoded NEC(1): Value:FD808F (32 bits)”. If you have a supported remote control, skip to the “Movements” section further down this page. In our case, we had to add support for our SilverLit remote control. ## SilverLit RC The SilverLit Remote Control vehicle protocol is meant to the SilverLit RC set that was sold through Costco and included a dump truck, flatbed, bulldozer or crane. We reuse this remote control to direct the movements of the Droid robot. The protocol was undocumented, and I reverse engineered it. Turns out that the remote control first transmits a header burst of 38 kHz infrared light. The bits are then transferred as short or longer period with no IR light (space) followed by short burst of IR light (mark). encoding header 1.778 msec mark 13 bits data bits 1 bit stop bit The bits are encoded as shown below. bit value encoding “0” 0,400 msec space, followed by 0.722 msec mark “1” 1.037 msec space, followed by 0.722 msec mark The data bits are shown in the table below. bit(s) meaning [12:11] vehicle identifier (00=dumptruck, 01=flatbed, 10=bulldozer, 11=crane) [10:9] no special meaning. 11 when bit[7] is 0, otherwise 00 [8] 0 for backward [7] 0 for forward [6] 1 for right [5] 1 for left [4] 1 for down [3] 1 for up [2] 1 for light on [1:0] checksum (see below) The CRC is calculated as follows checksum = bit[11:12] ^ bit[10] ^ bit[9:8] ^ bit[7:6] ^bit[5:4] ^ bit[3:2] We forked Chris Young’s IRLib, to add support for SilverLit remote vehicle/car protocol. My version of the library is available on GitHub. ## Movements 2BD: describe how instructions trigger movements Continue reading about the Making it dance. ## Audio spectrum This page describes how the peak values of the frequency bands are retrieved. The spectrum analyzer chip (MSGEQ7) measures the peak voltage in 7 frequency bands. These voltages are multiplexed on one output pin. To read each value, we use the RESET and STROBE* pins. 1. A RESET pulse followed by a wait (≥72 μs) resets the multiplexer. 2. On the first falling edge of the strobe signal, the 63 Hz output propagates to OUT. After ≥36 μs, this analog value can be read by the host. 3. Each additional strobe falling edge advances the multiplexer one frequency band (63 > 160 > 400 > 1,000 › 2,500 > 6,250 > 16,000 Hz) and this will repeat indefinitely. The multiplexer read rate is also the output decay time control. Each read decays that the value of that frequency band by approximately 10%. [datasheet] The timing is shown below, and includes some corrections compared to the datasheet [Maxfield]. With a load of 33pF//1MΩ, the settle time of the output is 36 μs. The output impedance of the MSGEQ7 is 700 Ω. This is well under the Arduino recommended 10 kΩ for the A/D sample-and-hold capacitor to charge up. Many others have written about this as well [Hienzsch, Lewis, drrobot, library, sketch]. Continue reading about the Visualization. ## Build environment The code has been compiled using the Arduino 1.6.5 tool chain. The code should compile user the Arduino IDE. Another option is the derived Atmel Studio that supports an In Circuit Emulator (ICE). However, the IDE flavor should be irrelevant to the compilation. ## Visual Studio IDE The Visual Studio 2013 Community Edition can be downloaded for free. We supplement it with the tools listed below. ## Configure the tool chain • Turning on the compiler warnings • e.g. Visual Studio > Tools > Visual Micro > Compiler Warning • Enable C++11 support (to allow enum classes). • e.g. add -Wall -std=c++11 to compiler.cpp.flags in hardware/arduino/avr/platform.txt • Install the libraries • Adafruit GFX • Adafruit LED backpack • Chris Young’s IRLib., or my fork when you use the SilverLit remote (see page ???). Note that you can’t have the original IRLib library installed because it will cause a conflict. • Either place them in the libraries folder to your Arduino tool chain, or in the libraries folder in your sketchbook path. Note that the Arduino tool chain doesn’t understand #include "dir1/dir2/header.h" • Reload the tool chain • e.g. Visual Studio (Tools » Visual Micro » Reload tool chains) • Select the correct board type, • Clear the build, and rebuild • Upload to Arduino ## Code The following pages describe the implementation. Continue reading about the Audio spectrum. ## Hardware This project uses two input sources, a microphone and an IR detector. It also has two outputs, a LED matrix and a robot (Droid). The microphone picks up the sound of music, and the system displays the audio spectrum and makes the droid dance to the beat of music. It can also operate in a mode where it receives signals from a remote control to demonstrate the droid’s movements. ## Schematics We ended up putting the MSGEQ7 and its glue on piece of proto board. The critical parts are in the oscillation circuit connected the CLIN input. The odd duck is the 200kΩ resistor that is not an E12-series value. You can also make an equivalent value by 220 kΩ and 2.2 MΩ in parallel, or put two 100kΩ in series. Now that we’re on the subject: all capacitors should be ceramic types. ### Power ### Logic ## Bill of Materials The price comes down to under$40 based on single quantity, excluding shipping and taxes. However, note that some items, such as the PCB, have minimum order quantities.

Name Description Suggested mfr and part# Paid
PCB1 Arduino METRO 328, or Arduino Uno R3 Adafruit 2488 $17.50 PCB2 Bicolor LED Square Pixel Matrix with I2C breakout Adafruit 902$15.95
PCB3 Electret microphone w/ auto gain control Adafruit 1713 $7.95 U1 Seven Band Graphic Equalizer Display Filter Mixed Signal MSGEQ7$
U2 MOSFET P-CH 12V/4.3A, SOT-23 Infineon IRLML6401TRPBF $0.53 U3 Linear voltage regulator, 6V/1.5A, TO220-3 STMicroelectronics L7806CV$0.69
U4 IR detector $D1, D2, D3, D4 Schottky diode, 30V/200mA, SOD523 Onsemi RB520S30T5G$
R1, R2 Resistor, 22 kOhm, 1/8 W, 0805 Yageo RC0805FR-0722KL $0.10 R3 Resistor, 200 kOhm, 1/8 W, 0805$
R4 Resistor, 200 Ohm, 1/8 W, 0805 $C1, C2, C3 Capacitor, 0.1 µF, multi-layer ceramic, 6.3 V, 0805 KEMET C0805C104M3RACTU$0.10
C4 Capacitor, 33 pF, multi-layer ceramic, 6.3 V, 0805 $C5 Capacitor, 0.33 uF, multi-layer ceramic, 16V, 0805$
C6 Capacitor, 0.1 uF, multi-layer ceramic, 16V, 0805 $J1 Headphone jack stereo connector, 3.5mm, kinked pin Kycon STX-3120-5B$0.74
J2 Power connector jack, 2X5.5mm, kinked pin CUI PJ-202A $0.71 Connectors JST, 1.25mm pitch (GH) Kycon$
M1 Digital Micro Servo, 4.8-6V, 0.09sec/60°, 22.4×12.5x23mm, 9g Turnigy TG9d $4.12 M2, M3, M4 Ultra-Micro Digital Servo 4.8-6V. 0.08sec, 16x8x18mm, 2g H-King HKM-282A$4.50

Make sure the MSGEQ7 is authentic. It should draw about $$0.8\,\rm{mA}$$; pin 6 should have a reference voltage of $$2.5\,\rm{V}$$ and have an indentation near pin 1.

## Materials

The components listed below are available from components warehouses like Mouser and DigiKey, and hobby stores like Adafruit and Sparkfun.

### Microphone

We use the same microphone breakout, as we did in the Pitch Detector project. This breakout has an amplifier that automatically controls the gain up to a 2 Vpp output signal. Other microphones will work for as long as the output signal is strong enough. Alternatively, you can connect directly to a music source using a 3.5 mm phone connector. This outputs about 0.9 Vpp. In all cases, remember to decouple the DC component using the 0.1 μF capacitor.

### Spectrum Analyzer

This MSGEQ7 spectrum analyzer chip requires some analog components as listed below.

• 22 kΩ (red-red-orange)
• 0.1 μF (104, blue multi-layer ceramic capacitor)
• 0.1 μF (104, blue multi-layer ceramic capacitor)
• 33 pF (33, brown ceramic capacitor), correct value is important!
• 200 kΩ by placing 220 kΩ (red-red-yellow) and 2.2 MΩ in parallel (red-red-green)

Not all MSGEQ7 chips are made equally. We found it helpful to solder these components and a chip socket on a little breakout. This makes it easy to try different chips.

The MSGEQ7 is very sensitive to noise on the power rail. Add a 47 μF across the power line seems to help.

### Infrared Remote and Detector

We used a SilverLit infrared remote to send control signals to the Arduino, but any remote that is supported by the Arduino IRLib library should work. The infrared (IR) detector demodulates the received IR signal and outputs a pulse stream. For the detector we used a TSOP38238, but again there are many other flavors that may work.

### Arduino

We use the commonly available Arduino UNO R3 Given that you are reading this article, you are probably already familiar with this open-source microcontroller prototyping platform.

Remember to connect 3.3 Volt to the AREF input. The Analog-to-Digital converter uses this as a reference.

### Bi-color 8×8 LED matrix

For the display we choose an I2C 8×8 Bicolor LED matrix. This matrix connects using the two wire I2C interface, an inter integrated circuit protocol like the SPI that I described in my article Math Talk.

### Droid

We sacrificed an Android Mini Collectible Figure from the Google store in Mountain View. After cutting the legs, arms and popping off the head, it assault intensified by drilling out its eyes. We followed the instructions for the pink figure in this instructable. To move the ams and head, we used 3 small ultra-micro servos. These come with an 1.25 mm pitch Molex Picoblade connector, that requires a small extension cable. The body itself is moved with a larger micro servos that connects using the more traditional 2.5 mm pitch JST-XH connector. For the eyes we used two LEDs in series and a current limiting resistor.

These ultra-micro servos have a maximum voltage of 4.7 Volt and the combination of servos can draw up to 500 mA. We supply this 4.7 Volt rail using a separate lap power supply. Signaling diodes bring the 5 V output from the Arduino down to about 4.3 Volt.

## Conclusion

Describes the conclusion for pitch detection on Arduino. Questions and problems that arose and lessons learned. Part of the project Arduino Pitch Detector.

I found that a sample rate of 9615 Hz and a window size of 200 samples combined with interpolation produce correct results for clarinet and piano from 155.6 Hz (Eb3) to 1568 Hz (G6).

On the clarinet, it misses the far lowest note (D3) and the very highest notes that only select musician can play (Ab6-B6) as shown in the visual. For clarinet, normalization only adds one low note to the range, while causing the frequency to be less accurate for high notes. As a result, I decided to not use normalization for clarinet.

For piano, I would use normalization because it adds an extra 7 low notes that would otherwise not be recognized. Things brings the piano range to 98 Hz (G2) to 1568 Hz (G6).

Segmentation works well at the cost of a slight delay which is noticeable but acceptable. Overall, I found that this project pushed the Arduino to its limits in both processing power and available memory.

To simplify things, I have undone the effect of transposing in processing the test results. The graph below illustrates this

## Questions and problems that arose

The Arduino did not have enough memory to store the audio samples. Using fewer samples would cause low notes to be missed. Instead, I lowered the sampling rate so that there were fewer audio samples to store but caused the device to miss high notes. To improve the accuracy of these high notes, I used quadratic interpolation.

I minimized the delay by sampling the audio asynchronously and by limiting the autocorrelation algorithm to frequencies corresponding to the lowest and highest notes of the instrument.

To improve the range of notes recognized, I tried normalizing the autocorrelation for the zeroes introduced by the lag while shifting the waveform. For clarinet, normalization only added one low note while causing the frequency to be less accurate. As a result, I decided to not use normalization for clarinet. For piano on the other hand it would adds seven extra low notes.

## Lessons learned

This was my first big project. It took many iterations to build something that I could be proud of. While some iterations made improvements, many were discouraging. However, reviewing these failures gave me valuable insights. I foremost learned to chose a project that I am truly passionate which made it easier to stick with. When problems arose, it helped to be patient.

In presenting my project, it worked best to start with a simple introduction and then answer questions. This allowed me to adjust to each person’s interests and background. I met judges who used the same algorithms as that I used in my project. Sharing insights with the judges at the Synopsis Silicon Valley and Technology Championship and the California State Science Fair was memorable.

In the future, I would make my project smaller and easier to handle. I would create an 1-inch round printed circuit board and would also use the Intel Curie SoC. This SoC can be programmed in a similar manner to the Arduino and includes a Bluetooth interface. This interface would allow me to eliminate wires. I also considered porting the code to Java to run on an Android phone, but I prefer to have a small device that can be clipped to the instrument.

Want to learn about other software projects? Refer to our other Embedded C projects

## Note level segmentation and buffering

Note level segmentation and buffering for pitch detection on Arduino (segment.cpp, segmentbuf.cpp). Describes the algorithms used for note level segmentation. Part of the project Arduino Pitch Detector.

The segmentation algorithm determines the beginning and duration of each note. The algorithm is based on a rule-based model published by Monty in 2000. Other algorithms were published by Tenney in 1980 and Cambouropoulos in 1997. The target platform is small embedded systems around an 8-bit RISC-based μController, running at 16 MHz, with 2 kByte SRAM and 32 kByte Flash memory (Arduino UNO R3). The choice of target places limitations on CPU usage and memory footprint.

The algorithm takes two characteristics into consideration:

1. Note pitch, related to the fundamental frequency (f0) of the signal.
• The class Frequency determines the fundamental frequency using autocorrelation as described by Brown [Brown, 1990]. The pitch is given by a MIDI number m = 69 + 12 * log2(f/440). With a window size of 200 samples and a modest sample rate of 9,615 S/s, the implementation is able to detect frequencies between 98 Hz (note G2) to about 2,637 Hz (note E6).
• Another option was to use a FFT and the correlation theorem, while this might be faster, is has a larger memory footprint.
2. Energy envelope, related to the loudness of the signal.
• The class Microphone determines the peak-to-peak amplitude of the signal for each window of samples. Using a window size of 200 samples at 9,615 S/s, this results in 48 envelope points per second. This same envelop may be used for beat detection. At a tempo of 160 beats/min, this corresponds to 18 samples/beat. Other methods considered were the Hilbert transform, or the sum of the root-mean-square amplitudes, both of which were found to computational intensive.

Based on these characteristics, the algorithm determines the beginning (onset) and the duration of the notes, by applying rules:

1. The pitch is considered constant when the fundamental frequency stays within a piano key distance (+/- 3%). In other words, when it rounds to the same MIDI pitch.
2. A note onset is recognized when the pitch lasts for a minimum note duration.
• This eliminates errors concerning spurious notes. The advantage of this approach over using only the amplitude envelope, is that even in glissando or legato the onset is easily detected.
• The disadvantage is that the onset is only recognized after this minimum note duration.
• Another concern is that during the release. Because the fundamental frequency disappears first, followed by harmonics one after, the algorithm may erroneously recognize these harmonics as a new note.
• A minimum note duration (MIN_SEGMENT_DURATION) of 40 msec, corresponds to about two sample windows.
3. The note duration is the time from its onset to its termination. The termination is determined by either:
1. The recognition of silence
• Silence is recognized when the signal energy falls below the noise floor.
• A noise floor (AUDIBLE_THRESHOLD) of 20 (15% of full scale) seems to work well with a automatic gain control microphone.
2. The pitch changing
• A pitch change is recognized when a different pitch remains constant for the minimum note duration. This implies that the system allows the pitch to have different values during the decaying part of a note.
3. The energy increasing during the decay
• Therefore, if the energy rises during the decay phase of the note, I can assume that another note with the same pitch has been played. A threshold is applied, so only significant increases in energy will cause a new note to be recognized. A threshold (SEGMENT_ENERGY_INCR_THRESHOLD) of 40% above the energy of the prior window has yielded good results.

The next part, is the Conclusion.

## Frequency and pitch detection

Describes frequency and pitch detection for pitch detection on Arduino (frequency.cpp, pitch.cpp). The core of the application: the frequency detection algorithm. Part of the project Arduino Pitch Detector.

Each musical instruments create an unique set of harmonics [demo]. In the commonly used Even Tempered Scale, the A4 key on the piano corresponds a fundamental frequency $$f_0=440\mathrm{\ Hz}$$. Other frequencies follow as:

$$f=2^{\frac{n}{12}}\times 440\,\rm{Hz}$$ where $$n$$ is the number of half-steps from middle A (A4).

## Designs considered

The main challenge of this project is to detect the fundamental frequency of the notes played using an embedded system. The fundamental frequency is defined as the lowest frequency produced by an oscillation.

The following three methods were considered

1. Using a time-domain feature such as zero crossings. This means that you find the distance between when the waveform goes from negative to positive the first time and when it does that a second time.
2. Using autocorrelation to determine the frequency of instrumental sounds as published by Judith C. Brown and Bin Zhang [Brown, Monti]. Autocorrelation is a math tool for finding repeating patterns. It estimates the degree to which a signal and a time lagged version of itself are correlated. A high correlation indicates a periodicity in the signal at the corresponding time lag.
3. An alternate method of calculating autocorrelation is by using a Fast Fourier Transform and approaching it similar to convolution. To get cross-correlation instead of convolution, I time-reverse one of the signals before doing the FFT, or take the complex conjugate of one of the signals after the FFT as shown in $$R_{xx}(k) = \mathcal{F}^{-1}\left(\mathcal{F}(t)\times\mathcal{F}(t+k)^*\right)$$

A literature study revealed that using time-domain features (1) will not perform well for musical instruments, such as a clarinet, that produce harmonics that are stronger than the fundamental frequency.

Brown’s method (2) is more promising. It calculates the autocorrelation $$R_{xx}$$ at lag $$k$$ by the equation [wiki, Lyon]

\begin{align} R_{xx}(k) & =\frac{1}{\sigma^2} \sum_{t=1}^N(s_t-\bar{s})(s_{t+k}-\bar{s})\\ \rm{where}\quad \bar{s}&=\frac{1}{N}\sum_{t=1}^Ns_t,\quad \sigma=\sqrt{\frac{1}{N}\sum_{t=1}^N(s_t-\bar{s})^2}\nonumber \end{align}

The symbols:

• $$s$$ are audio samples
• $$N$$ is the total number of samples
• $$k$$ is the lag
• $$\bar{s}$$ is the mean signal value
• $$\sigma^2$$ is a normalization factor.

However, calculating the autocorrelation requires $$2N$$ subtractions, $$N$$ additions, $$2N$$ multiplications, and $$N$$ divisions. This is likely to exceed the design constraints.

The alternate method of calculating autocorrelation (3) reduces the processing requirement to $$N-\log(N)$$, but the algorithm uses significantly more memory. This leaves less memory to store audio samples thereby reducing the window size and consequently limits the ability to recognize low frequencies.

Once it determines the frequency, the MIDI pitch $$m$$ follows as

$$m = 69+12\log_2\frac{f}{440}$$

## Design to find the frequency

To detect the fundamental frequency, I simplified Brown’s method by making two assumptions

1. The signal has no DC bias, $$\bar{s}=0$$.
2. We’re only interested in the lag for which the autocorrelation peaks, not the absolute value of the autocorrelation. Therefore, the normalization factor $$\sigma^2$$ that is independent of the lag $$k$$ can be ignored.
3. If the term $$t+k$$ extends past the length of the series, the series is considered to be $$0$$.

Based on these assumptions, the autocorrelation can be estimated as:

$$R_{xx}(k) = \sum_{t=1}^Ns_t\,s_{t+k}$$

The figure below shows a visualization of the term . The original waveform is shown in blue, and the time lagged version in red. The black curve shows the multiplication of these signals.

The plot below shows an example of the estimated autocorrelation for $$R_{xx}(k)$$ as a function of the lag $$k$$. By definition the maximum autocorrelation $$R_{xx}(0)$$ is at lag $$k=0$$.

I ported my frequency detection code to GNU Octave to enhance my visual understanding of the algorithm. This was especially helpful in determining the empirical threshold for the peak finding algorithm.

A peak finding algorithm looks for the first peak that exceeds a threshold at $$\frac{2}{3}R_{xx}(0)$$. The corresponding lag $$k_0$$ is considered the period time $$T_0$$. The fundamental frequency $$f_0$$ follows as the inverse of $$T_0$$.

The listing below shows a code fragment from frequency.cpp that implements the autocorrelation function.

## Design to find the peak

By definition the autocorrelation is maximum at lag $$k=0$$. If we find the maximum value for $$R_{xx}(k)$$, for $$0 \lt k \lt N$$, then $$k$$ is the period time. This requires calculating $$R_{xx}(k)$$ for all values of $$k$$.

To make it faster, I accept the first maximum that is above $$\frac{2}{3}R_{xx}(0)$$. The first peak that exceeds this value is considered the period time $$T_0$$. The algorithm is shown below.

### Simulation

A complementary simulation in GNU Octave visualizes the algorithm, making the process easier to understand and fine tune. The video below shows the calculation of $$R_{xx}(k)$$, and the peak finding algorithm. To run the simulation yourself, load the file simulation/file2pitch.m. in GNU Octave or Matlab.

## Analyzing accuracy

Analysis revealed that the sample rate and window size determine the maximum and minimum frequency that can be recognized. These variables can be configured in config.h.

1. If the sample rate is to low, high frequencies will only have a few audio samples per period, causing these frequencies not to be accurately recognized.
2. The window size is the number of audio samples that are processed at the time in the autocorrelation loop. If the windows size is too small, low frequencies cannot be recognized.
3. The delay is caused by the sampling of the input, calculating the frequency and the segmentation algorithm. The highest delay occurs at the lowest frequency, approximately 60 milliseconds. This was noticeable but acceptable. I observed that my simple synthesizer software introduced a noticeable additional delay. The delay was minimized by sampling audio while doing the calculations, and by stopping the autocorrelation as soon as the frequency could be determined.

### Range

The project’s aim is to recognize the range of notes produced by a B♭ clarinet. This clarinet can produce notes from E♭3 to G6, corresponding to a fundamental frequencies $$f_L$$ and $$f_H$$ \shaded{ \left\{ \begin{align} f_L &= 155.6 \, \rm{Hz} \nonumber \\ f_H &= 1.568 \, \rm{kHz} \nonumber \end{align} \right. }

For the 12-note-per-octave equal-tempered scale, each note or semi-tone is $$5.946309436\%$$ “higher” in frequency than the previous note. Given that the frequency will be rounded to a note pitch, we can allow for an error rate $$\varepsilon$$ $$\varepsilon \approx 0.05946\%$$

The highest frequency $$f_H$$, determines the sample rate. To stay within the error rate $$\varepsilon$$, the sample rate $$f’_{s}$$ follows as $$f_s’ = \frac{f_H}{\varepsilon} = \frac{1568}{0.05946} \approx 26.37\,\rm{kHz}$$

The Arduino can only sample a signal at $$2^a\,\frac{16\,\rm{MHz}}{128\times 13}$$, where $$a\in \mathbb{N}$$. As a consequence, the sampling rate has to be rounded up to $$f_{s}^{\prime\prime}$$ $$f_s^{\prime\prime} = 38.461\,\rm{kHz}$$

The lowest frequency $$f_L$$ determines the window size, where the number of audio samples $$N^\prime$$ should cover at least twice period time of the lowest frequency $$f_L$$ $$N^\prime = 2\,\frac{f_s^{\prime\prime}}{f_L} = 2\,\frac{38461}{155.6} \approx 495$$

Each audio sample requires 1 byte of the Arduino’s SDRAM. With only about $$200$$ bytes left available to store audio samples $$N’$$ will not fit.

### Alternative

Instead, we use the available $$200$$ bytes to store the samples, so the window size $$N$$ is $$\shaded{ N = 200 }$$

In order to recognize the lowest frequency $$f_L$$, the sample frequency $$f_s^{””}$$ follows as \begin{align} f_s^{”’} &\le f_L\,\frac{N}{2} \nonumber \\ &\le 155.6\,\frac{200}{2} \nonumber \\ &\le 15.560\,\rm{Hz} \end{align}

For the Arduino sample rate scaler, this needs to be rounded down to $$f_s$$ $$\shaded{ f_s = 9.615\,\rm{kHz} }$$

The resulting frequency range can be expressed as \begin{align} \frac{f_s}{N/2} \lt &f \lt \Delta\varepsilon\,f_s \nonumber \\ \frac{9615}{200/2} \lt &f \lt 0.0595\times 9615 \nonumber \\ 96.2\,\rm{Hz} \lt &f \lt 572\,\rm{Hz} \end{align}

This implies that it will only reach D♭5. Let’s see how we can improve the accuracy.

#### Measurements

Low notes are meased accurately, but errors increase with frequency.

## Improvements

With the base algorithm in place, time has come to focus on improvements.

### Improving speed

The fundamental frequency requires calculating $$R_{xx}$$ for all values of $$k$$ . However, the possible values of $$k$$ are limited by the window size and sample frequency.

• The window size limits the lowest frequency, while
• the sample frequency limits the highest frequency.

The range for the lag $$k$$ is determined by the highest frequency $$f_H$$ and the windows size $$N$$ \begin{align} \frac{f_s}{f_H} \leq &k \leq \frac{N}{2} \nonumber \\[0.5em] \implies \frac{9615}{1568} \leq &k \leq \frac{200}{2} \end{align}

Rounding down the minimum and rounding up the maximum values, the range for the lag $$k$$ follows as $$\shaded{ 6 \leq k \leq 62 }$$

### Improving accuracy of high notes

We can improve the accuracy of especially the high notes, by using interpolation. Fitting a parabolic curve to the prior, current and next autocorrelation values $$k_1,k_2,k_3$$. The value for $$k$$ that corresponds to the top of the parabola, is the estimate lag $$k_m$$.

The difference between the estimated autocorrelation value $$k_m$$ and the measured factor $$k_2$$ is the correction factor $$\delta$$. [^1][^2] [^1]: [Polynomial Interpolation, Abdulkadir Hassen [^2]: Cross-Correlation, Douglas Lyon] $$\delta = \frac{k_3-k_1}{2(2k_2-k_1-k_3)}$$

The corresponding sample window size $$N’$$, is determined by the lowest frequency $$F_L$$ $$N’ = 2\frac{f_{s}’}{f_L}=2\,\frac{9615}{155.6}\approx 125$$

I rounded the window size $$N”$$ up to 200 bytes. $$\shaded{ N = 200 }$$

#### Measurements

The accuracy of high note dramatically improves as shown below.

### Improving accuracy of low notes

As the peak finding algoritm considers higher values of the lag $$k$$, the autocorrelation values decrease because of the zeroes introduced in the shifted signal.

I tried normalizing the autocorrelation for these introduced zeroes, by multiplying with a normalization factor of $$\frac{N}{N-k}$$. As a result the normalized autocorrelation can be expressed as $$R_{xx}(k)=\frac{N}{N-k}\sum_{t=1}^Ns_t\,s_{t+k}$$

#### Measurements

For the clarinet the nomalization makes it drop some high notes as shown in the figure below. The clarinet doesn’t benefit from improving accuracy on low notes, because the lowest notes it can play is $$155.6\,\rm{Hz}$$ compared to the Arduino that can detect up to $$96\,\rm{Hz}$$.

##### Piano

The results for piano samples using interpolation are shown below for reference.

For piano it greatly benefits the low notes as shown below.

Continue reading on the next page to learn about note level segmentation.

## Digitizing the analog signal

Describes digitizing the analog signal for pitch detection on Arduino. How the analog signal from the microphone is digitized. Part of the project Arduino Pitch Detector.

## Digitizing the analog signal (microphone.cpp)

The microphone driver determines the value presented at the analog port, using analog-to-digital (ADC) conversion. The driver reads the analog port asynchronously. In this asynchronous approach, the CPU starts the conversion, and moves on to do other things. Once, the a conversion is complete, the ACD converter interrupts the CPU. The CPU postpones what it is doing, reads the conversion result, and returns to whatever it was doing. This way the the CPU doesn’t have to wait for the ADC conversion to complete (typically abt. 832 cycles) [Meettechniek, 2013].

This asynchronous approach, usually requires two buffers. One buffer is written to asynchronously, while the another buffer is being processed. Given that Arduino UNO has a precious 2048 bytes of SDRAM, this code goes through some hoops, so that it only needs one buffer.

The typical sequence of events is:

1. The application needs audio samples and calls Microphone::getSamples().
• The first time getSamples() is called, it allocates memory for the samples, and starts the sampling. All other times, the samples are probably already available (started another time in step 3).
• The method getSamples() waits until all samples are available.
2. The application processes the samples.
3. Once the application is done with the samples, it calls Microphone::update().
• This initiates the interrupt driven process of gathering new samples. It does not wait for the samples.
4. The application continues (without accessing the samples) e.g. to display the results.

Once the application, determines the frequency, it starts to take new samples while determining the pitch, determining note level segmentation, and displaying the results. This application uses a sample rate of 9615 samples/s. Refer to autocorr.cpp for details.

The next page describes one of the key algorithms: finding the frequency and pitch.

## Signal path

This is the second part of the article about the Arduino Pitch Detector. It describes the signal path starting with the instrument sound and ending with the sound produced by a MIDI synthesizer, possibly driven by auto-accompaniment software.

In the prototype, the signal passes from a music instrument and is played as MIDI events on an external synthesizer. Various hardware and software components make up the signal path. This page gives a brief description of each of these components.

1. Musical instrument. Playing a clarinet, pan flute or piano.
2. Microphone with automatic gain control.
3. The microcontroller is responsible for:
1. Digitizing the analog signal (microphone.cpp).
2. Detecting the fundamental frequency and pitch of the signal (pitch.cpp).
3. Determining the beginning and duration of the notes. (segment.cpp, segmentbuf.cpp )
4. Displaying the notes on a tremble staff or piano roll. (pianoroll.cpp, staff.cpp)
5. Sending MIDI events to an attached synthesizer. (midiout.cpp).
4. An external synthesizer interprets the MIDI messages to create sound. The MIDI signal can also be sent to auto-accompaniment software (e.g. Band-in-a-Box) or Notation software (e.g. MidiEditor).

To test the MIDI functionality of the Arduino, it needs to be connected to a synthesizer. A synthesizer interprets MIDI messages to create sound. Synthesizers to range from all-in-one hardware synthesizers, to software synthesizers (Cool Virtual Midi, Nano, GarageBand).

A synthesizer interprets MIDI messages to create sound. There are many synthesizers to choose from:

1. Hardware. I found an old SynthStation25 in our garage and use that with an old iTouch running the Nano app.
2. Microsoft Windows comes with a “Microsoft GS Wavetable Synth”. It has horrible latency and sounds bad.
3. I ended up using, and iPad with Garage Band. The messages can be carried over USB (requires the Apple iPad Camera Connection Kit) or over possibly over the network (AppleMIDI). When using USB, connect the USB cable first, and then plug the adapter into the iPad. For the connector, refer to the corresponding section above.