Budburst models have mainly been developed to capture the processes of individual trees, and vary in their complexity and plant physiological realism. We evaluated how well eleven models capture the variation in budburst of birch and Norway spruce in Germany, Austria, UK and Finland. The comparison was based on the models performance in relation to their underlying physiological assumptions with four different calibration schemes. The models were not able to accurately simulate the timing of budburst, especially not in the Alps. In general the models overestimated the temperature effect, thereby the timing of budburst was simulated too early in UK and too late in Austria. Among the better performing models were an empirical model based on spring temperatures and the Alternating model that combines growing degree days with a dynamic forcing requirement. These models were also the models least influenced by the calibration data. For birch the best calibration scheme was based on one site in Germany, and for Norway spruce the best scheme included multiple sites in Germany. Most model and calibration combinations indicated greater bias with higher spring temperatures, mostly simulating earlier than observed budburst.