I've been pondering the caveats of the current locking mechanism, which is current done with global locks for each structure type. This can become really inefficient if, for example, there are several X509 operations going on at the same time with X509 data that have no relation whatsoever (at least any memory relation) with each other. The comment I saw was that locking should really be per object instead of per type, so that several X509 (in this example) operations could take place simultaneously.
The drawback with moving locking to be per object is that we would be creating a huge amount of them. I can't imagine that it's too bad on Unix using pthreads, since those are (as far as I understand them) just a memory structure containing some bits of data. However, under some other operating systems, like Windows, I understand that mutexes are handled through so called handles, and that it's a limited resource (a bit like file descriptors on Unix?). I'm sure one can cough up more drawbacks...