Let LLVM and Unregisterized lower greater than native sized primops
When using the LLVM or Unregisterized backends and compiling primops for prim types exceeding the native width, we should avoid our C stubs and let LLVM/C lower them more efficiently in-line.
NCG needs to call slow FFI functions where we "borrow" the C compiler's implementation, but there is no reason why we need to do that for LLVM or C.