Under the Hood: Building a Real-Time Chord Recognizer

Lobsters Hottest Tools

Summary

This article explains the technical architecture of a real-time chord recognizer, detailing a four-stage pipeline using pitch-class bitmasks, candidate generation, score normalization, and musical heuristics.

<p><a href="https://lobste.rs/s/pliqep/under_hood_building_real_time_chord">Comments</a></p>
Original Article
View Cached Full Text

Cached at: 05/19/26, 04:43 PM

# Under the Hood: Building a Real-Time Chord Recognizer Source: [https://whatchord.earthmanmuons.com/articles/under-the-hood](https://whatchord.earthmanmuons.com/articles/under-the-hood) ## The problem is not a lookup The first intuition when building a chord recognizer is to build a dictionary\. There are only 12 pitch classes, which means there are only`2^12 = 4096`possible pitch\-class sets\. Store a name for each set, and when a user plays C\-E\-G, look up \{C, E, G\} and return “C major\.” The problem is not memory\. Four thousand entries is trivial\. The problem is meaning\. A pitch\-class set does not contain enough information to decide what musicians will call it\. Piano players often leave out notes that a dictionary entry might expect\. Extended chords add notes that no fixed dictionary entry anticipates\. And the same set of pitch classes, as discussed in the[companion article](https://whatchord.earthmanmuons.com/articles/chord-naming.html), can legitimately be described as multiple different chords depending on musical context\. What you actually need is a scoring model\. It has to evaluate how well any given set of notes fits each chord type, rank all plausible interpretations, and apply musical judgment when scores are close\. ## Overview: a four\-stage pipeline Before diving into each component, here is the overall shape of the algorithm\. A snapshot of sounding notes enters at the top; a ranked list of chord interpretations comes out at the bottom\. Input: set of sounding pitch classes \+ lowest \(bass\) note ↓ Pitch\-class bitmask 12\-bit integer: one bit per semitone in the octave ↓ Candidate generation Each sounding note becomes a candidate root, scored against every chord template, extensions extracted ↓ Score normalization Raw scores are normalized for fair comparison across chord complexities ↓ Ranking Musical heuristics resolve ambiguous scores; hard structural rules override when the score alone would pick the wrong answer ↓ Output: top ranked chord candidates, result cached in LRU The rest of this article walks through each stage in detail, ending with a discussion of known limitations\. ## Pitch classes and bitmasks WhatChord models the common 12\-tone equal temperament \(12\-TET\) pitch\-class framework used by MIDI keyboards, which divides each octave into equal semitone positions\. A*pitch class*is the note’s position within that octave, ignoring which octave it’s in, so middle C, the C above it, and the C three octaves below all share pitch class 0\. In this engine, pitch classes are numbered 0 \(C\) through 11 \(B\)\. For analysis, the engine collapses the sounding notes into a set of pitch classes plus the lowest sounding note as bass\. The pitch\-class set is represented as a 12\-bit integer mask where bit*n*is set if pitch class*n*is present\. C major \(C=0, E=4, G=7\) looks like this: 11109876543210BA♯AG♯GF♯FED♯DC♯C000010010001``` // Pitch classes: C=0, E=4, G=7 int pcMask = (1 << 0) | (1 << 4) | (1 << 7); // pcMask == 0b000010010001 == 0x091 ``` This representation is compact and fast\. Checking whether a pitch class is present is a single bitwise AND\. Counting present pitch classes is a popcount\. Rotating the set relative to a candidate root is a loop over bits with modular arithmetic\. All of these operations are cheap\. A key design decision:**only pitch classes actually present in the voicing are tested as candidate roots\.**There are no “ghost roots” and the algorithm never proposes an interpretation where the chord is rooted on a note that is not being played\. This keeps the candidate count small \(bounded by the number of sounding notes, typically 3–7\) and avoids obviously wrong readings\. This is a deliberate “solo keyboard” assumption\. The current engine is optimized for the common case where the same MIDI stream contains both the harmony and the bass note\. A future ensemble mode could relax that rule for settings where another instrument is carrying the bass, allowing rootless voicings to imply roots that are not literally present in the keyboard part\. ## Chord templates Chord qualities are also defined as bitmask templates\. Each one describes three sets of intervals relative to the root: - **Required:**tones that must be present to identify this quality\. Missing more than one required tone causes the template to be skipped entirely\. - **Optional:**tones frequently omitted in real voicings \(almost always the perfect 5th\)\. Present when played, unremarkable when absent\. - **Penalty:**tones that actively contradict this quality\. Having a major 3rd present when you are trying to identify a minor chord hurts the score\. The 22 templates, organized by complexity: QualityRequired intervalsOptionalKey penalty tonesMajorR, M3P5m3, m7, M7MinorR, m3P5M3, m7, M7DiminishedR, m3, ♭5—M3, P5AugmentedR, M3, ♯5—m3, P5Sus2R, M2, P5—m3, M3, m7, M7Sus4R, P4, P5—m3, M3, m7, M7Major 6R, M3, M6P5m3, m7, M7Minor 6R, m3, M6P5M3, m7, M7Dominant 7R, M3, m7P5M7, m37sus2R, M2, m7P5m3, M3, P4, M77sus4R, P4, m7P5m3, M3, M77♭5R, M3, ♭5, m7—P5, M7, m37♯5R, M3, ♯5, m7—P5, M7, m3Major 7R, M3, M7P5m7, m3Major 7sus2R, M2, M7P5m3, M3, P4, m7Major 7sus4R, P4, M7P5m3, M3, M2, m7Major 7♭5R, M3, ♭5, M7—P5, m7, m3Major 7♯5R, M3, ♯5, M7—P5, m7, m3Minor 7R, m3, m7P5M7, M3Minor\-Major 7R, m3, M7P5M3, m7Half\-Diminished 7R, m3, ♭5, m7—P5, M3, M7Fully Diminished 7R, m3, ♭5, d7—m7, P5, M3, M7Notice that the perfect 5th is optional for most chord families\. Requiring it would cause the algorithm to miss many idiomatic voicings in common use\. Penalty tones are not hard rejections\. The template is still scored, it just loses points\. This handles cases where a note might simultaneously belong to one chord and partially fit another, and lets the score reflect the degree of fit rather than producing a binary yes/no\. ## Template scoring For each candidate root \(each pitch class present in the voicing\), the analyzer rotates the pitch class mask relative to that root to get an interval mask\. Then it scores that interval mask against all 22 templates\. ``` // Rotate: compute intervals above rootPc for each sounding note int rotateMaskToRoot(int pcMask, int rootPc) { var rel = 0; for (var pc = 0; pc < 12; pc++) { if ((pcMask & (1 << pc)) == 0) continue; final interval = (pc - rootPc) % 12; rel |= (1 << (interval < 0 ? interval + 12 : interval)); } return rel; } ``` The scoring formula accumulates raw points from several components: ComponentWeightNotesEach required tone present\+4\.0Structural foundationEach missing required tone\-6\.0Max 1 allowed; 2\+ causes the template to be rejectedEach optional tone present\+1\.5Adds color without being essentialEach penalty tone present\-3\.0Contradicts the chord qualityEach unexplained “extra” tone\-0\.5Before extension extraction; small because extensions are realBass is root or inversion tone\+1\.0Root position, 1st inv, 2nd inv, 3rd invBass is color tone \(7th\-family chord\)\+0\.75Upper\-structure voicing, legitimateBass is extension \(triad \+ slash\)\+0\.25Add\-chord slash notationBass unexplained by template\-0\.25Arbitrary slashAlteration penalty \(any altered extension\)\-0\.60`\-0\.30`for fully dim7 \(see Diminished 7th section below\)Lydian\-dominant 13th coherence bonus\+2\.1Applied when root\-position dominant has 9, ♯11, and 13 all present6th chord without 5th \(3\-note voicing\)\-0\.60Disambiguates C6\(no5\) from Am7/CThe raw score is then divided by`sqrt\(requiredToneCount\)`to normalize across chord complexities: ``` final denom = reqCount > 0 ? math.sqrt(reqCount.toDouble()) : 1.0; final normalized = raw / denom; ``` Without normalization, 7th chords \(which have more required tones\) would consistently outscore triads just by having more opportunities to earn the`\+4\.0`required\-tone bonus\. A perfectly matched C major triad would lose to a slightly\-mismatched C dominant 7th\. The square root normalization \(rather than linear\) preserves meaningful score separation while preventing complex chords from systematically outscoring well\-matched simpler ones\. ### Diminished 7th penalty Fully diminished 7th chords receive a softer alteration penalty because their symmetry makes alternate roots score unusually well\. Halving that penalty helps preserve the reading musicians expect when an added tone could otherwise make a rotated diminished interpretation look artificially cleaner\. ## Extension extraction After template scoring, any tone not accounted for by the base template \(required \+ optional \+ penalty\) lands in the “extras” mask\. These get converted to named extensions: - **Alterations**\(from the extras mask\): flat 9 \(semitone 1\), sharp 9 \(semitone 3\), sharp 11 \(semitone 6\), flat 13 \(semitone 8\) - **Natural extensions:**9 \(semitone 2\), 11 \(semitone 5\), 13 \(semitone 9\) Whether natural extensions become “9/11/13” or “add9/add11/add13” depends on the stack below them\. In this conservative naming model, a 9 needs the 7th, and an 11 needs both the 7th and 9th\. A 13 needs the 7th and 9th, but not a sounding 11, matching common chord\-symbol practice where the 11th is often omitted\. Without that support, the same pitch class is labeled as an add tone instead\. There is one dominant\-context exception: with a complete dominant 7th shell, interval 3 is interpreted as a sharp ninth instead of a minor\-third penalty\. That lets voicings such as G\-B\-D\-F\-A♯ score and spell asG7♯9rather than as a dominant 7th with an unexplained contradictory tone\. ## How the weights were tuned The scoring weights were not established arbitrarily\. They were tuned empirically against a set of golden test cases: specific voicings where the expected output was chosen in advance\. Most golden cases capture chords a musician would name unambiguously; ambiguous cases pin the intended primary reading for the current scoring and ranking model\. The test suite covers major, minor, diminished, dominant, altered, and extended voicings across different inversions and genuinely ambiguous situations\. The tuning loop looked like this: 1. Run the golden test suite\. 2. For any case that failed, use the`chord\-debug`CLI tool to inspect the full ranked candidate list with score breakdowns\. 3. Adjust weights, add rules, or add scoring bonuses until the failing case passed\. 4. Re\-run the full suite to verify no regressions\. The`chord\-debug`tool runs the full analysis pipeline on any set of notes and prints each candidate with its score, individual weight contributions, and the ranking rule that decided its position relative to the previous candidate: ``` $ dart run tool/chord_debug.dart F# Bb C E pcs: Bb, C, E, F# | bass: F# | key: C major 1) F#7b5 8.50 members: root=F# major3=A# flat5=C flat7=E req+16 bass+1 raw=17.00 / sqrt(4) => 8.50 2) C7b5 / Gb 8.50 Δ +0.00 ~tie (vs prev: Prefer root position) members: root=C major3=E flat5=Gb flat7=Bb req+16 bass+1 raw=17.00 / sqrt(4) => 8.50 3) C7#11 / F# 6.73 Δ -1.77 (vs prev: Score outside near-tie window) ``` The same diagnostic output also exposes enharmonic spelling decisions: MIDI provides pitch classes, and the engine chooses note names from the winning chord context\. That kind of diagnostic visibility was essential for understanding why the algorithm chose wrong answers and what needed to change\. A weight that fixed one case would sometimes break another, and the only way to make progress without regressing was to have the full ranked list visible while making targeted adjustments\. ## The ranking problem The debug output above shows why scoring is only the first half of the problem\. Once multiple readings are plausible, the analysis engine needs a separate ranking layer that encodes musical priorities more directly than a single numeric score can\. This is not an isolated case\. Several common note sets produce near\-identical scores for multiple plausible interpretations, and the raw score cannot distinguish which one a musician would name: - C\-E\-G\-A:C6vs\.Am7/C\(identical scores; the 6th chord in root position should win\) - B\-D\-F\-A♭:Bdim7vs\.G♯dim7/Bvs\.Ddim7/C♭vs\.Fdim7/C♭\(C♭ = B enharmonically; all four readings score identically due to dim7 symmetry\) The analyzer handles these ambiguities with two ranking paths: narrow structural overrides for cases where the conventional name should win despite score, and ordered tie\-breakers for candidates whose scores are already close\. ### Hard rules Hard rules are intentionally narrow\. They cover cases where remote slash\-chord interpretations can outscore conventional root\-position dominant, altered\-dominant, altered\-seventh, or diminished\-seventh names that musicians are more likely to use\. They also cover a small minor\-triad color case where a complete minor sharp\-eleventh inversion is preferred over a root\-position major\-7\-sus4 name that depends on an altered thirteenth\. ### The near\-tie window If none of the hard rules fire and the score difference is greater than`0\.20`\(the`nearTieWindow`constant\), the higher\-scoring candidate wins on score alone\. When scores are within the near\-tie window, tie\-breaker rules are applied sequentially\. The first rule that produces a non\-tie result decides the ordering: 1. Prefer root\-position 6th over inverted 7th \(the C6 vs\. Am7/C case\) 2. Prefer upper\-structure dominant 7th slash \(color bass with no other alterations\) 3. Prefer root\-position diminished 7th \(symmetrical chords default to bass\-as\-root\) 4. Prefer dominant 7th shell over dim7 slash 5. Prefer fewer altered/tension colors \(including natural 11 against a major third\) 6. Prefer diatonic chords \(given the key signature\) 7. Prefer the tonic chord \(I\) over other diatonic options 8. Prefer I when the bass is the tonic pitch class 9. Prefer natural extensions \(9/11/13\) over add\-tones; then fewer total extensions 10. Prefer root position 11. Prefer 1st inversion over 2nd inversion 12. Prefer 7th chords over triads when both fit the played notes 13. Prefer fewer extensions 14. Avoid suspended chords If all of these rules still have not produced a winner, there is a deterministic fallback: sort by root pitch class numerically\. This ensures the output is always consistent for the same input, even for exotic voicings\. The ordering of these rules encodes musical priorities\. Structural clarity \(root position, shell tones\) comes before contextual preferences \(diatonic, tonic\)\. Conventional naming \(fewer alterations, natural extensions\) comes before complexity\. Suspended chords are deprioritized late because they are valid but easy to over\-detect when a third is absent, so they should win only when the surrounding evidence supports them\. ## Caching for real\-time performance Running the full pipeline \(up to 12 candidate roots × 22 templates = 264 template evaluations\) on every MIDI state change would be wasteful\. In practice, a pianist tends to produce many repeated input states throughout a musical piece\. The engine uses a 512\-entry Least Recently Used \(LRU\) cache implemented as a`LinkedHashMap`\. The cache key is a hash of three inputs: - The pitch class set - The analysis context \(key signature \+ tonality\) - The`take`parameter \(how many candidates to return, default 8\) The context is included in the key because diatonic preference rules depend on it; a different key signature can change which candidate ranks first even for identical voicings\. ``` final key = Object.hash(input.cacheKey, context, take); final cached = _cache[key]; if (cached != null) { // Promote on hit so eviction removes LRU, not FIFO _cache ..remove(key) ..[key] = cached; return cached; } ``` The LinkedHashMap preserves insertion order\. On a cache hit, the entry is removed and re\-inserted at the end \(most recently used\)\. On eviction, the first key is removed \(least recently used\)\. This is the standard LRU pattern in Dart without a separate doubly\-linked list\. The 512\-entry capacity was chosen from benchmarks across random inputs, exhaustive inputs, tonal progressions, and simulated live note transitions\. Realistic playing showed high reuse, and larger caches produced no material improvement\. ## What the algorithm does not handle A few things are known limitations or non\-goals: - **Polychords\.**Two simultaneous independent harmonies \(a D♭ major triad over a C dominant 7th, common in Stravinsky\) are not modeled\. The algorithm will find the best single\-chord description of the combined note set\. - **Temporal context\.**Each snapshot of sounding notes is analyzed independently\. The algorithm does not track what chord came before and does not use progression history to inform interpretation\. Using temporal context to further increase accuracy is a natural direction for future improvement\. - **Non\-12\-TET tuning\.**This engine is built around 12 pitch classes and standard MIDI note numbers\. Microtonal intervals, quarter tones, and just\-intonation distinctions have no representation in this model\. The scoring heuristics are tuned from experience\. They encode accumulated musical convention, but they are adjustable constants, not proven axioms\. Edge cases and counterexamples help improve them\. ## The codebase WhatChord is written in[Dart](https://dart.dev/)using the[Flutter](https://flutter.dev/)framework\. The chord analysis engine lives entirely in`lib/features/theory/domain/analysis/`, three files with no platform dependencies and a unit test suite that verifies known\-correct outputs across major, minor, dominant, altered, extended, and ambiguous chord types\. The project is open source and released under the Zero Clause BSD License, which means you are free to use, modify, and share the code however you like\. If you find a misidentified chord, the best way to report it is to long\-press the chord card to open*Analysis Details*, copy the diagnostic output, and[open a GitHub issue](https://github.com/EarthmanMuons/whatchord/issues/new/choose)\. The diagnostic output includes the exact pitch classes and context that produced the result, which makes it straightforward to reproduce and debug\. ### See it in action\. Free for iOS and Android\. No subscription, no ads, all analysis on\-device\. [![Download on the App Store](https://whatchord.earthmanmuons.com/images/Download_on_the_App_Store_Badge_US-UK_RGB_blk_092917.svg)](https://apps.apple.com/us/app/whatchord-midi/id6758409779)[![Get it on Google Play](https://whatchord.earthmanmuons.com/images/GetItOnGooglePlay_Badge_Web_color_English.svg)](https://whatchord.earthmanmuons.com/articles/under-the-hood#) [![](https://whatchord.earthmanmuons.com/images/GitHub_Invertocat_White.png)View source on GitHub](https://github.com/EarthmanMuons/whatchord)

Similar Articles

How Far Can Chord-Symbol Time-Series Adaptation Carry Genre Identity? Capabilities and Boundaries in Multi-Genre Chord-Symbol Modeling

Hugging Face Daily Papers

This paper evaluates how small adaptation interfaces (LoRA, IA3, BitFit, prefix tuning, full fine-tuning) extend a frozen Music Transformer to eleven target genres for chord-symbol time-series modeling. Results show consistent harmonic prediction improvement but limited genre identity representation, concluding that chord symbols alone are insufficient to capture complete genre identity.

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Hugging Face Daily Papers

ArtifactNet is a lightweight neural network framework that detects AI-generated music by analyzing codec-specific artifacts in audio signals, achieving F1=0.9829 on a new 6,183-track benchmark (ArtifactBench) with 49x fewer parameters than competing methods. The approach uses forensic physics principles to extract codec residuals through a bounded-mask UNet and compact CNN, with codec-aware training reducing cross-codec drift by 83%.