Computational analysis of fugues

Mathieu Giraud, Richard Groult, Emmanuel Leguy, Florence Levé

A fugue is a polyphonic piece built in imitation, where all voices appear successively sharing the same initial melodic material: a subject and, in most cases, a counter-subject. These patterns are repeated throughout the piece, either in their initial form or more often altered or transposed, building a complex harmonic texture.

We present here a computational analysis of fugues, trying to automatically analyze their structure. The diagrams below show the results of our algorithms on two sets of fugues:

Note that some subjects (S) or counter-subjects (CS) are badly predicted, some episodes are missing. If you look for a complete musicological analysis of Bach WTC fugues, follow the links below each diagram to access other analysis. The analysis of S. Bruhn are especially detailed.

The fugue dataset

This dataset gives a reference analysis for the 24 fugues of the first book of Bach's Well-Tempered Clavier (WTC I, BWV 846-893) and the 12 first fugues from Shostakovich 24 Preludes and Fugues (op. 57, 1952). These annotations are based on several musicological sources as well as on our own analysis. The file gives the symbolic position (measure number and position in measure) of subjects (S) and counter-subjects (CS), as well as cadences and pedals. We also report slight modifications of S/CS (actual start with respect to the time signature, delayed resolutions...).

As in any analytical work, there may be no consensus between musicologists for some analytic elements. This is true even for fundamental elements such as the exact definition of the subject: In 8 of the 24 Bach fugues, at least two sources disagree on the end of the subject. We indicate these alternative subject definitions in the file (but do not report alternative CS).

We collected these data firstly to evaluate our own algorithms on fugue analysis, but they might also be useful in other situations, for instance in evaluating algorithms for pattern extraction or structure analysis.

  • Download the dataset from
  • Changelog
    • 2013.12: First release on 12 Shostakovitch fugues + minor updates on Bach fugues (960 annotations)
    • 2013.05: First release on 24 Bach fugues (610 annotations)
  • Other relevant data

The annotations include all complete subjects and counter-subjects, as well as pedals, and, for Bach, cadences. Further releases will also include also incomplete occurrences of S/CS. We welcome any feedback or suggestions.

Parameters and results

On the diagrams for Bach and Shostakovich fugues, you can switch between a ground truth analysis and the output of our method. For Bach fugues, clicking on any analysis links to a page where S/CS patterns can be displayed with the VexFlow notation engine. For more information, and discussion on this method and its results, please see the bibliographical references.

  • Subject and counter-subject predictions [1]
    All the output files are computed with a diatonic interval model and a threshold of (2 + 10%% length). See the full log for details on candidate subjects and counter-subjects. The end of the subject is exactly predicted in 16 of the 24 fugues (bad predictions: Mozart K.546, and Bach's WTC I fugues #5, #8, #9, #12, #19, #22, #23, #24). For Shostakovich fugues, starting from MIDI files, the pitch equivalence model is +/- 1 semitone. The end of the subject is exactly predicted in 13 of the 24 fugues (bad predictions or no predictions: fugues #2, #3, #4, #6, #7, #8, #11, #12, #13, #16, #21).
  • Partial harmonic sequences in episodes [2]
    The interval model is here QPI (quantized partially overlapping intervals) [Lemstrom and Laine 98]. Harmonic sequences in at least two voices are detected. See the full log for details on candidate episodes. In WTC fugues, good coverage of episodes with such sequences is obtained for fugues #2, #3, #7, #10, #18, #21.


The complete fugue pipeline (S/CS/CS2, cadences, pedals, structure), as well as the reference dataset, is described in the article [1]. Please cite [1] if you use the fugue dataset. Principles for S/CS detection were described in [2], and detection of harmonic sequences in [3].

The sources used to compile te dataset were the following ones: