What is solfège? The 1,000-year-old system behind modern ear training

Solfège (also written solfeggio in Italian, solfeo in Spanish) is the system of singing each note of a scale on a specific syllable: do, re, mi, fa, sol, la, ti, do. Whatever your previous exposure to it — The Sound of Music, a school choir, a music-theory class — those seven syllables are the most widely used pedagogical tool in the history of Western music education, and they remain the backbone of how serious aural-skills programs train the ear today.

This article is a plain-English answer to “what is solfège, and why does it actually work?”

The seven syllables and the scale

In a major scale, each scale degree gets its own syllable:

Scale degree	Syllable (English)	Solfège (Italian)
1	Do	Do
2	Re	Re
3	Mi	Mi
4	Fa	Fa
5	Sol (or So)	Sol
6	La	La
7	Ti (or Si)	Si
8 (octave)	Do	Do

Sing the syllables to a major scale and you are doing solfège. Sing them to a melody — picking out which syllable each note corresponds to — and you are doing the foundational ear-training and sight-singing exercise that conservatories have used for two centuries.

Where solfège came from

Solfège is older than the modern major scale. The syllables are attributed to Guido of Arezzo, an 11th-century Benedictine monk and music teacher, who took them from the first syllables of successive lines of a Latin hymn (Ut queant laxis) — ut, re, mi, fa, sol, la. Each line of the hymn began on the next note up the hexachord, so the syllables became permanently associated with the steps of the scale. Ut was later replaced with the more singable do, and a seventh syllable (si, later ti) was added when the scale extended to seven notes ^[1].

The system spread through European music education and has remained essentially unchanged for a thousand years. Almost every conservatory pedagogy that survives today — Italian, French, Spanish, German, Hungarian, the broader Anglophone tradition — uses it.

Fixed-do vs movable-do: the one decision that matters

There are two main flavors of solfège, and they differ on a single question: does do mean a specific pitch, or does it mean the tonic of whatever key you’re in?

Fixed-do treats the syllables as note names. Do always means C. Re always means D. Mi always means E. This is the approach used in Romance-language conservatories (Italian, French, Spanish) and it lines up exactly with European note-naming traditions where C is called “Do” instead of “C” in everyday speech.

Movable-do treats the syllables as functional roles in a key. Do is whatever the tonic of the current key is — C in C major, F in F major, B♭ in B♭ major. The syllable signals the note’s job in the key, not its absolute pitch.

The two systems train different skills:

Fixed-do trains absolute-pitch identification (what note is this?).
Movable-do trains relative-pitch / functional hearing (what role does this note play?).

The Anglophone, Hungarian (Kodály), and most American music-education traditions use movable-do because the skill it builds — hearing each note’s role in a key — is what musicians actually use to follow songs by ear, transcribe melodies, and sight-sing. The empirical case for movable-do as the primary ear-training mode is strong; see our deeper article on why functional ear training transfers more efficiently than isolated interval drilling.

Why solfège actually works

The skill solfège builds is not “remembering syllables.” It is hearing each pitch as a role in the key.

Decades of work by Carol Krumhansl and her collaborators on the tonal hierarchy established that musically trained listeners — and untrained ones, after enough exposure — perceive each pitch in a melody not as a frequency but against an internal scaffold of stability built from the cadences and scale material they have just heard ^[2]. A trained ear hears the third of the key as something specifically third-of-key-shaped — warm, stable above the tonic — and that sensation is built up by hours of associating the syllable mi with that role across many keys and melodies.

Movable-do solfège is the pedagogical machine that creates the association. You sing many melodies in many keys; in every one of them, mi always sounds like mi. The syllable becomes inseparable from the functional sensation. Within months, learners begin to “hear” the syllables of a melody before they consciously think about which syllable applies. That is what musicianship sounds like from the inside.

Hand signs (Curwen / Kodály)

A common pairing with movable-do solfège is the Curwen hand signs — a set of physical gestures, one per scale degree, taught alongside the syllables. The fist (do) is at chest height; the open hand (sol) is higher and outward; the bent finger (ti) is up by the eyebrow.

The signs were invented by John Curwen in 19th-century England and were popularized in 20th-century music education by Zoltán Kodály’s Hungarian curriculum. They are not decorative — adding a kinesthetic channel to the auditory and verbal channels markedly improves children’s pitch-association memory in classroom settings, and the same multi-modal effect carries into adult learners ^[3].

In any modern children’s chorus that uses Kodály method, the hand signs are visible from the first day.

Solfège in modern ear training

The role of solfège in 21st-century ear-training apps and curricula is complicated. Most apps that bill themselves as “ear training” actually drill intervals in isolation (two notes play, identify the distance) — a task that has its place but does not, on its own, build the functional ear that solfège trains.

The apps and curricula that take aural skills seriously — Berklee’s functional aural-skills tracks, the Bruce Arnold “Use Your Ear” method, the Kodály-method classroom programs, the Ward Method, and the academic ear-training research — all centre scale-degree-based hearing as the foundational skill. Solfège syllables are one way to label scale degrees; numbers (1, 2, 3, …) are another. The syllables versus numbers debate matters less than the underlying concept: every pitch is heard as a role in a key, primed by a cadence.

If you are using an app, the highest-leverage adjustment is to ensure every exercise is presented inside an established key. The cost is a few seconds per question; the long-run benefit is the difference between drilling test items and actually building the ear that hears music.

FAQ

Is solfège the same as singing “do re mi”? The syllables are the same. Solfège is the broader pedagogical practice — sight-singing melodies on the syllables, taking dictation in the syllables, and gradually internalizing each syllable as a functional sensation tied to its scale degree.

Do I need to know solfège to learn music? Not strictly — many self-taught musicians never use the syllables. But solfège (or scale-degree numbers, which serve the same function) is the most efficient documented path to a functional ear, and the pedagogical literature is unusually consistent on this point.

Should I learn fixed-do or movable-do? For most self-directed adult learners, movable-do. It builds the relative-pitch / functional skill that musicians actually use. Fixed-do is appropriate if you have absolute pitch and your goal is to pair the syllables with note names directly.

What about chromatic notes? Movable-do extends to chromatic alterations through suffix changes: a raised second is ri, a lowered third is me, a raised fourth is fi, a lowered seventh is te, and so on. These are taught after the diatonic syllables are fluent.

References

For Guido of Arezzo and the Ut queant laxis origin of the syllables, see Stevens, J. (1986). Words and Music in the Middle Ages. Cambridge University Press, chapters on solmization and Guidonian pedagogy. ↩︎
Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. Oxford University Press. The probe-tone studies establishing the tonal-hierarchy rating profile are summarized in chapters 2–4. ↩︎
For multi-modal training effects in early music education, see studies on Kodály-method outcomes in elementary classrooms; the broader literature on kinesthetic-auditory pairing in pitch acquisition supports Curwen’s intuition. ↩︎