Note level segmentation and buffering

Note level segmentation and buffering for pitch detection on Arduino (segment.cpp, segmentbuf.cpp). Describes the algorithms used for note level segmentation. Part of the project Arduino Pitch Detector.\(\)

The segmentation algorithm determines the beginning and duration of each note. The algorithm is based on a rule-based model published by Monty in 2000. Other algorithms were published by Tenney in 1980 and Cambouropoulos in 1997. The target platform is small embedded systems around an 8-bit RISC-based μController, running at 16 MHz, with 2 kByte SRAM and 32 kByte Flash memory (Arduino UNO R3). The choice of target places limitations on CPU usage and memory footprint.

The algorithm takes two characteristics into consideration:

  1. Note pitch, related to the fundamental frequency (f0) of the signal.
    • The class Frequency determines the fundamental frequency using autocorrelation as described by Brown [Brown, 1990]. The pitch is given by a MIDI number m = 69 + 12 * log2(f/440). With a window size of 200 samples and a modest sample rate of 9,615 S/s, the implementation is able to detect frequencies between 98 Hz (note G2) to about 2,637 Hz (note E6).
    • Another option was to use a FFT and the correlation theorem, while this might be faster, is has a larger memory footprint.
  2. Energy envelope, related to the loudness of the signal.
    • The class Microphone determines the peak-to-peak amplitude of the signal for each window of samples. Using a window size of 200 samples at 9,615 S/s, this results in 48 envelope points per second. This same envelop may be used for beat detection. At a tempo of 160 beats/min, this corresponds to 18 samples/beat. Other methods considered were the Hilbert transform, or the sum of the root-mean-square amplitudes, both of which were found to computational intensive.

Based on these characteristics, the algorithm determines the beginning (onset) and the duration of the notes, by applying rules:

  1. The pitch is considered constant when the fundamental frequency stays within a piano key distance (+/- 3%). In other words, when it rounds to the same MIDI pitch.
  2. A note onset is recognized when the pitch lasts for a minimum note duration.
    • This eliminates errors concerning spurious notes. The advantage of this approach over using only the amplitude envelope, is that even in glissando or legato the onset is easily detected.
    • The disadvantage is that the onset is only recognized after this minimum note duration.
    • Another concern is that during the release. Because the fundamental frequency disappears first, followed by harmonics one after, the algorithm may erroneously recognize these harmonics as a new note.
    • A minimum note duration (MIN_SEGMENT_DURATION) of 40 msec, corresponds to about two sample windows.
  3. The note duration is the time from its onset to its termination. The termination is determined by either:
    1. The recognition of silence
      • Silence is recognized when the signal energy falls below the noise floor.
      • A noise floor (AUDIBLE_THRESHOLD) of 20 (15% of full scale) seems to work well with a automatic gain control microphone.
    2. The pitch changing
      • A pitch change is recognized when a different pitch remains constant for the minimum note duration. This implies that the system allows the pitch to have different values during the decaying part of a note.
    3. The energy increasing during the decay
      • Therefore, if the energy rises during the decay phase of the note, I can assume that another note with the same pitch has been played. A threshold is applied, so only significant increases in energy will cause a new note to be recognized. A threshold (SEGMENT_ENERGY_INCR_THRESHOLD) of 40% above the energy of the prior window has yielded good results.

The next part, is the Conclusion.