Usage of 16-bit immediate may result in a slowdown in the predecoder due to length-changing-prefix stalls. This is why we have UseStoreImmI16. If the flag is unset we should load the short and do a cmpl instead.
https://stackoverflow.com/questions/65530097/does-a-length-changing-prefix-lcp-incur-a-stall-on-a-simple-x86-64-instruction
https://stackoverflow.com/questions/65530097/does-a-length-changing-prefix-lcp-incur-a-stall-on-a-simple-x86-64-instruction