It seems like it should be possible to have some kind of GUI library in which user interface elements can be painted onto the screen frame-by-frame, instead of the common practice of assembling an object-oriented hierarchy of widgets.
I’m thinking of something like (rough ECMA-like pseudocode):
static var clicks = 0;
drawRectangle( 0, 0, 100, 100)
drawText( 0, 50, "This button has been clicked " + clicks + " times" )
drawClickListener( 0, 0, 100, 100, function() { clicks++ } )
Now we should be able to wrap this “button” inside affine transformations and clipping operations. We would call and draw the button during each frame of animation, and the button would cease to exist on the first frame in which it stopped being called.
I’m not looking for a revolutionary GUI paradigm; I just want the ability to do some specific, weird effects when the situation calls for it. For example, a car driving by with a clickable button on it’s side. If the button is behind a tree in the same scene, then it shouldn’t be click-able.
In fact, ideally, drawing user interface elements would use the same path as drawing visible shapes, but would pass in an event handler instead of a color or fill pattern. I could even imagine that there would be one or more channels of callback information alongside the RGB channels.
This is in the category of “things that I can’t possibly be the first person to think of but haven’t actually ever seen done.”

Sounds like so-called imgui, see the tutorial here:
http://iki.fi/sol/imgui/
I don’t know what model Flash uses internally, but it sounds similar. At least you can do the button passing behind a tree thing.
If you were to build a GUI in graphics-drawingcombinators (or something based on it), that’s how it would work (except it wouldn’t look like what you wrote because it is a functional specification). It draws with OpenGL and detects clicks with OpenGL picking.
Games can get away with that. More down-to-earth applications need to be more event-based, so that they can sit idle if you are not using them. I think that is the reason for the widget hierarchy.
Maybe there is a happy medium? Eg. build a combinator library that caches its renderings, so that drawing is very very easy when things aren’t changing.
Cairo is annoying for this because to detect a click you have to check as you draw each object if it was clicked.
You can do this with JavaFX, too. There you build a SceneGraph where you can define for each object whether and how it reacts to user interaction like being clicked on with the mouse. All properties of the objects (in your example the position) may be animated easily. The idea is quite nice, although several standard widgets are still missing but you can implement your own of course.
http://javafx.com/