I spent some time yesterday profiling roguestar. I do this every few months just to see where things stand, and there are always two culprits at the top of roguestar-gl.prof, every single time:
* typeclass dictionary lookups in inner loops
In the first case, I think the simplest solution is to INLINE the puppy. Can the ghc inliner be a little bit more aggressive when it sees dictionary lookups? Inliners are tricky business. I’m not sure I see a simple heuristic. Vaguely: leaf functions that require dictionary lookups need to be specialized.
Rational can sneak into an unsuspecting program through realToFrac, and absolutely *kills* performance.
I sit down thinking to myself, "Ok, today I'm going to streamline my Super Mumbo Jumbo Widget and get 15% faster performance," or some such goal I set for myself. And I run the profiler and 75% of my time is being spent in fromRational . toRational.