77 Commits

Author SHA1 Message Date
MerryMage
dfdec797e3 A64: Implement SADDLP 2018-07-15 18:50:09 +01:00
MerryMage
3bb6a432d8 A64: Implement UADDLP 2018-07-15 18:26:54 +01:00
MerryMage
c103a28386 A64: Implement EXT 2018-07-15 17:47:32 +01:00
Lioncash
5b5d79144e emit_x64_vector: Vectorize fallback path for EmitVectorMaxU32() 2018-07-14 08:17:46 +01:00
Lioncash
402032d107 emit_x64_vector: Deduplicate a bit of code in EmitVectorSetElement{8, 32, 64} functions
Given both branches are the same, we can hoist out the common code.
2018-07-07 14:49:14 +01:00
Lioncash
c4be14d5bf emit_x64_vector: Deduplicate a bit of code within EmitVectorGetElement8()
Given both branches use the same destination register size, we can hoist
the common code out.
2018-07-06 23:00:58 +01:00
Lioncash
06c1cf6721 emit_x64_vector: Vectorize fallback case in EmitVectorMultiply64()
Gets rid of the need to perform a fallback.
2018-05-26 21:33:46 +01:00
Lioncash
b747b67354 emit_x64_vector: Add break to final case in EmitVectorRoundingHalvingAddUnsigned()
This doesn't alter behavior but does make the code better if anything
else is ever added to this function in the future.
2018-05-26 21:25:14 +01:00
Lioncash
2652e92928 ir: Add opcodes for performing rounding halving adds 2018-05-26 11:48:56 +01:00
Lioncash
990a569b7a emit_x64_vector: Simplify AVX-512 codepath in EmitVectorMultiply64
I realized I introduced a helper for simple AVX operation emitting, so
use that instead of writing it all out long-form.
2018-05-23 08:02:12 +01:00
Lioncash
7cd0ff18bf emit_x64_vector: Emit VPMULLQ in EmitVectorMultiply64 on AVX-512{DQ, VL} capable CPUs
Shortens code-gen down to a single instruction in the 64-bit path.
2018-05-14 23:09:31 +01:00
Lioncash
d7951233bd ir: Add opcodes for signed absolute differences 2018-05-12 11:16:42 +01:00
Lioncash
a0e3943ade ir: Add opcodes for performing vector halving subtracts 2018-05-07 19:04:10 +01:00
Lioncash
97a9fce094 emit_x64_vector: Use VPOPCNTB in EmitVectorPopulationCount() if AVX-512 BITALG is available 2018-05-07 16:40:28 +01:00
Lioncash
1da4671b53 ir: Add opcodes for performing halving adds 2018-05-07 16:39:17 +01:00
Lioncash
fae2d940f2 emit_x64_vector: Emit VPMINSQ and VPMINUQ for 64-bit vector min operations if AVX-512VL is available 2018-05-04 07:52:22 +01:00
Lioncash
ec66d94121 emit_x64_vector: Emit VPMAXSQ and VPMAXUQ for 64-bit vector max operations if AVX-512VL is available 2018-05-04 07:52:22 +01:00
Lioncash
4ca546ce4d emit_x64_vector: Emit VPABSQ in EmitVectorAbs() for the 64-bit case if AVX-512VL is available 2018-05-03 23:22:18 +01:00
Lioncash
c9a6d6264e emit_x64_vector: Use VPSRAQ in EmitVectorArithmeticShiftRight64() if AVX-512VL is available 2018-05-03 16:13:11 +01:00
Lioncash
75d0b1ebd8 emit_x64_vector: Vectorize fallback path of EmitVectorMaxS32() 2018-05-02 17:14:07 +01:00
Lioncash
7f3cfff647 emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8() 2018-05-02 17:14:07 +01:00
Lioncash
9607376f2f emit_x64_vector: Vectorize fallback path in EmitVectorMinU32() 2018-05-02 17:13:41 +01:00
Lioncash
1162be609d emit_x64_vector: Vectorize fallback path in EmitVectorMinU16() 2018-05-02 17:13:41 +01:00
Lioncash
7cab5c2e03 emit_x64_vector: Vectorize fallback path in EmitVectorMinS32() 2018-05-02 17:13:41 +01:00
Lioncash
7fa0d2765a emit_x64_vector: Vectorize fallback path in EmitVectorMinS8() 2018-05-02 17:13:41 +01:00
Lioncash
73477f04dd emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
This can simply be merged with the previous one.
2018-05-01 19:31:48 +01:00
Lioncash
f90a476546 emit_x64_vector: Avoid left shift of negative value in LogicalVShift
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2018-05-01 19:31:48 +01:00
Lioncash
3e3ce37eb8 backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
Also adds IR opcodes to dispatch said variants
2018-04-30 22:41:17 +01:00
Lioncash
5e20e5a44a emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16() 2018-04-28 19:19:06 +01:00
Lioncash
f8a96dbe5f emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32() 2018-04-28 19:17:18 +01:00
Lioncash
43a3873dd2 emit_x64_vector: Use VBPROADCAST where applicable and available
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2018-04-26 22:28:09 +01:00
Lioncash
9c2550ac72 ir: Add opcodes for performing vector deinterleaving 2018-04-26 08:49:51 +01:00
Lioncash
e96bfd1133 emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs 2018-04-20 23:34:32 +01:00
Lioncash
f85e2681fc emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs 2018-04-20 23:34:32 +01:00
Lioncash
2fb3ddd2cd emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
Provides a better alternative to a fallback operation.
2018-04-20 23:34:32 +01:00
Lioncash
be48d44132 emit_x64_vector: Get rid of some magic numbers in loop bounds 2018-04-20 18:54:29 +01:00
Lioncash
a3c8ab61ea emit_x64_vector: Generify variable shift functions 2018-04-20 18:54:14 +01:00
Lioncash
f921005a70 ir: Add opcode for reversing bits in a vector 2018-04-03 21:18:03 +01:00
Lioncash
032b09cbdf ir: Add opcodes for performing vector absolute values 2018-04-03 07:49:08 +01:00
MerryMage
14d3d72aac IR: Implement VectorExtract, VectorExtractLower IR instructions 2018-04-02 21:52:46 +01:00
Lioncash
734447ef3d ir: Add opcodes for performing vector unsigned absolute differences 2018-04-02 19:08:20 +01:00
Lioncash
1e8fe95cc4 IR: Add opcodes for interleaving upper-order bytes/halfwords/words/doublewords
I should have added this when I introduced the functions for interleaving
low-order equivalents for consistency in the interface.
2018-03-31 11:01:38 +01:00
Lioncash
6278f83560 emit_x64_vector: Fix typo in VectorShuffleImpl
This is supposed to be pshufd, not pshufw (which only allows a 64-bit operand)
2018-03-23 19:51:09 +00:00
Lioncash
f62a258945 ir: Add IR opcodes for emitting vector shuffles
This uses the ARM terminology for sizes (Halfword -> 2 bytes, Word -> 4 bytes)
as opposed to the x86 terminology of (Word -> 2 bytes, Double word -> 4 bytes)
2018-03-21 15:40:03 +00:00
MerryMage
7673933a9b A64: Implement USHL 2018-02-20 19:48:15 +00:00
MerryMage
3184edf4a9 IR: Add IR instruction ZeroVector 2018-02-20 15:41:07 +00:00
MerryMage
616056d9a3 constant_pool: Add frame parameter 2018-02-20 14:04:48 +00:00
MerryMage
2880eb3da1 IR: Implement Vector{Max,Min}{Signed,Unsigned} 2018-02-13 17:56:46 +00:00
MerryMage
6d4f14e876 IR: Implement VectorGreaterSigned 2018-02-13 15:47:52 +00:00
MerryMage
a7e4202828 IR: Implement VectorSignExtend 2018-02-11 16:24:33 +00:00