A good amount of silicon goes to video processing compression and more recently to neural network inference: this gives us very fast compound instructions.
An example of the former is https://www.felixcloutier.com/x86/psadbw which computes the L1 norm between two byte-vectors.
The latter, part of VNNI, is https://www.felixcloutier.com/x86/vpdpbusd which is a convolution of a byte vector with an extra addition.
They're similar in that they compute similarity, and I believe this can be exploited in the window threshold-generalization and metrics like jaccard, cosine, and mutual_information.
A good amount of silicon goes to video processing compression and more recently to neural network inference: this gives us very fast compound instructions.
An example of the former is https://www.felixcloutier.com/x86/psadbw which computes the L1 norm between two byte-vectors.
The latter, part of VNNI, is https://www.felixcloutier.com/x86/vpdpbusd which is a convolution of a byte vector with an extra addition.
They're similar in that they compute similarity, and I believe this can be exploited in the
windowthreshold-generalization and metrics likejaccard,cosine, andmutual_information.