What you're talking about is binary translation. Here are a few pointers:

  - DynamoRIO: http://www.dynamorio.org/
  - Intel PIN: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool
  - Rosetta: https://www.apple.com/asia/rosetta/
  - Transmeta Crusoe: http://en.wikipedia.org/wiki/Transmeta_Crusoe
  - Binary Translation using Peephole Superoptimization: http://theory.stanford.edu/~aiken/publications/papers/osdi08.pdf

You might also find program specialization interesting and related:

   - SynthesisOS: http://valerieaurora.org/synthesis/SynthesisOS/

I've personally created three binary translators, and worked on DynamoRIO as well as a Linux kernel variant of DynamoRIO. It's hard to do well / right, but at the same time it's fun. If you're interested, it might be worth starting with an existing system to give yourself some good ideas of how to structure the system, and perhaps even how not to do it. It might also give you a sense of a niche worth targeting. Your list is interesting, but tackling all that is non-trivial, or at least, tackling all that in a general way is non-trivial. However, if you restrict yourself to some subset of programs / use cases, then you can make an impact.

Creating faster code is incredibly tricky. That was the original motivation for things like DynamoRIO (and HP's Dynamo before it), and it's mostly not worth it. You can sometimes do it for specific workloads in a way that is similar to feedback-directed optimization. You can also sometimes do with with inlining / outlining (e.g. moving cold code off the critical path of the icache) / specializing (creating specialized versions of code for specific inputs, use sites, etc.).