Why your ear works on piano but not on guitar: timbre, transfer, and what to train

A familiar pattern: a learner spends weeks getting comfortable with intervals and chord qualities in their ear-training app, where every example is played by the same clean piano sample. Then they pick up a guitar, or sit down to transcribe a vocal line, and the trained ear seems to vanish. The intervals they could nail at 95% on the app become 60% on a real recording.

This is not a regression. It is a known limitation of perceptual learning called the specificity-of-training effect: a skill trained under one set of conditions often fails to transfer to neighboring conditions unless the training explicitly varies those conditions ^[1]. The same effect appears across vision (training discrimination at one orientation does not always transfer to other orientations) and audition (training with one timbre does not always transfer to others). It is a real cognitive phenomenon and it has direct, actionable implications for ear-training practice.

The McGill timbre studies

A series of studies from Stephen McAdams’ lab at McGill and others has examined how listeners learn to recognize instruments and how that recognition does or does not generalize across pitch register and timbre. McAdams and colleagues (2023) trained nonmusicians to identify instruments at a fixed pitch (C4) and then tested generalization to other pitches. The conclusion: identification generalization over pitch varies a great deal across instruments, and explicit exposure to multiple pitch registers during training was needed for robust generalization ^[2].

A 2023 study on rapid absolute-pitch training, published in Attention, Perception, & Psychophysics, made an important parallel finding for pitch identification specifically: trained adults could partly generalize their pitch-recognition skills to untrained instruments, but the weakest generalization was to notes in higher octaves and to instruments timbrally distant from the training instrument ^[3]. The authors concluded that explicit training across timbres and octaves was necessary for the trained skill to behave like a perceptual ability rather than a memorized response.

These findings replicate a much older literature. Goldstone’s (1998) review of perceptual learning concluded that one of the most reliable principles in the field is that perceptual skill is specific to the conditions of training unless training explicitly varies those conditions ^[1:1]. The brain learns the discrimination it is asked to make, in the context it is asked to make it. Generalization is a separate ability that has to be trained on its own.

Why piano is dominant in ear-training apps — and why that’s a problem

Most ear-training apps default to piano, and there are good reasons. Piano produces clean attacks, stable pitch with no vibrato, equal-tempered tuning, and a wide dynamic range. The intervallic content is unambiguous. This is excellent for initial acquisition of an interval discrimination — the listener has no extra perceptual variability to filter through.

But the very cleanness of piano makes it a narrow training condition. Real music — the music a learner ultimately wants to be able to listen to and transcribe — is full of timbres that piano does not prepare the ear for:

Voice, with vibrato, formants, and pitch instability that the trained piano-ear has never had to discriminate through.
Bowed strings, with continuous pitch and a different attack profile.
Guitar, with a strongly inharmonic attack, sympathetic resonance from open strings, and bending/sliding pitch.
Brass and woodwind, with breath-driven attacks and pitch wobble.
Synthesizers, where the spectral content can vary radically and pitch may even be deliberately unstable.

A learner trained exclusively on piano has built a strong piano-specific representation of each interval. Hearing an interval on a different instrument requires the ear to identify the pitches through an unfamiliar spectral envelope — a separate task that the piano-only training did not exercise.

What the research says to do about it

A few principles emerge from the perceptual-learning and timbre-generalization literatures:

1. Vary timbre during training, especially after initial acquisition. McAdams’ and Hou’s findings both point to explicit training across timbres as the path to generalization ^[2:1] ^[3:1]. The first few sessions on a new interval can stay on piano for clarity. Subsequent sessions should rotate timbres. Even rotating among 3–4 instruments produces measurable improvement in cross-instrument transfer.

2. Vary register. A learner who has only practiced intervals in a single octave often fails on the same intervals in higher or lower registers. Spectral cues differ. Pitch perception degrades at the extremes of the human pitch range. The explicit fix is to practice the same interval in low, middle, and high registers.

3. Include at least one “messy” timbre. Voice or guitar are particularly important because they introduce the kinds of pitch instability (vibrato, inharmonicity) that real music contains. A learner whose ear can identify a major third on voice has a more transferable skill than one whose ear can only do it on piano.

4. Practice with real music regularly. The most ecologically valid form of cross-timbre training is transcribing actual recordings. This is also the highest-difficulty form, so use it as a complement to (not a replacement for) structured drills. Even five minutes of trying to transcribe a melody from a song you like, after a 15-minute structured session, builds the bridge from drilled skill to musical hearing.

A nuance: pitch-discrimination thresholds aren’t really timbre-specific

A 2022 paper in PNAS nuanced one part of this picture. Pitch discrimination thresholds — the smallest pitch difference a listener can detect — turned out to be better for synthetic timbres than for natural musical-instrument timbres, regardless of which instrument the listener was familiar with ^[4]. In other words, raw pitch-discrimination acuity is determined more by the physical acoustics of the stimulus than by familiarity with the timbre.

What does generalize across timbre is categorical interval and chord identification, which is a downstream cognitive task. The discrimination machinery works on any timbre; the trained category labels — “major third,” “dominant seventh” — are what fail to transfer if training was timbre-specific.

This is good news. It means the ear’s underlying perceptual acuity is not a barrier. The barrier is that the trained category-recognition skill needs to be built across timbres, and that is purely a matter of how training is structured.

What this means for an ear-training app

A few concrete suggestions for any tool — or any self-directed practitioner — to apply:

Default to piano for new lessons (clean acquisition).
Add a timbre-rotation toggle that, once enabled, rotates among 3–4 instrument samples per session. This should be available no later than the consolidation stage of any lesson.
Include a “vocal” or “voice” timbre option explicitly. This is the most musically important transfer condition for most learners and the one most apps lack.
Periodic “real music” drills, where the user transcribes a short phrase from a recording. These can be used as occasional check-ins on whether the trained skill has actually generalized.

Fifths currently runs entirely on a piano-style sample (with synth and other timbres in the audio engine but not surfaced as practice options at the time of writing). A rotating-timbre Pro feature is on the roadmap, in part because the perceptual-learning literature is clear that this is the difference between drilling test items and training a real musical ear.

The takeaway

If you have plateaued on app-based ear training and you suspect your skill is “stuck” at the level of the training stimulus, you are probably right. The fix is not to drill the same exercises harder. The fix is to vary the timbre, vary the register, and start translating the trained skill into actual music — by transcription, by playing along with recordings, by singing along in different keys. The perceptual-learning literature is clear that the ear that hears music is built from training that resembles music, not from training that resembles a textbook.

References

Goldstone, R. L. (1998). Perceptual learning. Annual Review of Psychology, 49, 585–612. On the specificity-of-training effect more broadly: Fahle, M., & Poggio, T. (Eds.) (2002). Perceptual Learning. MIT Press. ↩︎ ↩︎
McAdams, S., et al. (2023). Timbral cues for learning to generalize musical instrument identity across pitch register. Journal of the Acoustical Society of America. https://www.mcgill.ca/mpcl/files/mpcl/mcadams_2023_jacoustsocam.pdf. PubMed: https://pubmed.ncbi.nlm.nih.gov/36859162/. Open-access version: https://escholarship.mcgill.ca/downloads/q524jv43j ↩︎ ↩︎
Hou, J., et al. (2023). Generalizing across tonal context, timbre, and octave in rapid absolute pitch training. Attention, Perception, & Psychophysics. https://doi.org/10.3758/s13414-023-02653-0. See also: Learning fast and accurate absolute pitch judgment in adulthood. PMC: https://pmc.ncbi.nlm.nih.gov/articles/PMC12325523/ ↩︎ ↩︎
Pitch discrimination is better for synthetic timbre than natural musical instrument timbres despite familiarity. Proceedings of the National Academy of Sciences. PubMed: https://pubmed.ncbi.nlm.nih.gov/35931555/ ↩︎