Coming from me or anyone with little knowledge, the answers to why it increases as the frequency of the sine sweep rises could be many, neither of which is accurate. I'm sorry, but any answer I give can only be very partially accurate, as I can only refer to parts that make up Musepack's analysis and synthesis and not the sum of the parts (in essence the entire audio coder, psychoacoustic model and all, which you have to be very knowledgable to understand), which is the only thing to refer to if you want a real, accurate answer.

You may find a detailed specification of MPEG-1 Layer 2 coding tools and start from there. Musepack was based on it, and extended with various own tools/psymodel. It includes subband-based selectable channel coupling (search channel coupling info), adaptive noise shaping (or "ANS". various info available as well), clear voice detection (no info. basically, it helps the psymodel perform better during changes of the base frequency of harmonic signals). The spreading function that depends on the mid frequency and sound pressure level of critical bands is non-linear. Temporal masking with a variable time constant is used.

To give a superficial answer: it's probably related to the way the psychoacoustic model treats audio differently at different frequencies, the nature of polyphase quadrature filters, and maybe as high as 5th order filters are used in the adaptive noise shaping when the audio is a pure sine wave, and all that contributes to the non-linear, colorful and variable representation you may see on a graph.
