MerryMage
dfdec797e3
A64: Implement SADDLP
2018-07-15 18:50:09 +01:00
MerryMage
3bb6a432d8
A64: Implement UADDLP
2018-07-15 18:26:54 +01:00
MerryMage
c103a28386
A64: Implement EXT
2018-07-15 17:47:32 +01:00
Lioncash
5b5d79144e
emit_x64_vector: Vectorize fallback path for EmitVectorMaxU32()
2018-07-14 08:17:46 +01:00
Lioncash
402032d107
emit_x64_vector: Deduplicate a bit of code in EmitVectorSetElement{8, 32, 64} functions
...
Given both branches are the same, we can hoist out the common code.
2018-07-07 14:49:14 +01:00
Lioncash
c4be14d5bf
emit_x64_vector: Deduplicate a bit of code within EmitVectorGetElement8()
...
Given both branches use the same destination register size, we can hoist
the common code out.
2018-07-06 23:00:58 +01:00
Lioncash
06c1cf6721
emit_x64_vector: Vectorize fallback case in EmitVectorMultiply64()
...
Gets rid of the need to perform a fallback.
2018-05-26 21:33:46 +01:00
Lioncash
b747b67354
emit_x64_vector: Add break to final case in EmitVectorRoundingHalvingAddUnsigned()
...
This doesn't alter behavior but does make the code better if anything
else is ever added to this function in the future.
2018-05-26 21:25:14 +01:00
Lioncash
2652e92928
ir: Add opcodes for performing rounding halving adds
2018-05-26 11:48:56 +01:00
Lioncash
990a569b7a
emit_x64_vector: Simplify AVX-512 codepath in EmitVectorMultiply64
...
I realized I introduced a helper for simple AVX operation emitting, so
use that instead of writing it all out long-form.
2018-05-23 08:02:12 +01:00
Lioncash
7cd0ff18bf
emit_x64_vector: Emit VPMULLQ in EmitVectorMultiply64 on AVX-512{DQ, VL} capable CPUs
...
Shortens code-gen down to a single instruction in the 64-bit path.
2018-05-14 23:09:31 +01:00
Lioncash
d7951233bd
ir: Add opcodes for signed absolute differences
2018-05-12 11:16:42 +01:00
Lioncash
a0e3943ade
ir: Add opcodes for performing vector halving subtracts
2018-05-07 19:04:10 +01:00
Lioncash
97a9fce094
emit_x64_vector: Use VPOPCNTB in EmitVectorPopulationCount() if AVX-512 BITALG is available
2018-05-07 16:40:28 +01:00
Lioncash
1da4671b53
ir: Add opcodes for performing halving adds
2018-05-07 16:39:17 +01:00
Lioncash
fae2d940f2
emit_x64_vector: Emit VPMINSQ and VPMINUQ for 64-bit vector min operations if AVX-512VL is available
2018-05-04 07:52:22 +01:00
Lioncash
ec66d94121
emit_x64_vector: Emit VPMAXSQ and VPMAXUQ for 64-bit vector max operations if AVX-512VL is available
2018-05-04 07:52:22 +01:00
Lioncash
4ca546ce4d
emit_x64_vector: Emit VPABSQ in EmitVectorAbs() for the 64-bit case if AVX-512VL is available
2018-05-03 23:22:18 +01:00
Lioncash
c9a6d6264e
emit_x64_vector: Use VPSRAQ in EmitVectorArithmeticShiftRight64() if AVX-512VL is available
2018-05-03 16:13:11 +01:00
Lioncash
75d0b1ebd8
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS32()
2018-05-02 17:14:07 +01:00
Lioncash
7f3cfff647
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8()
2018-05-02 17:14:07 +01:00
Lioncash
9607376f2f
emit_x64_vector: Vectorize fallback path in EmitVectorMinU32()
2018-05-02 17:13:41 +01:00
Lioncash
1162be609d
emit_x64_vector: Vectorize fallback path in EmitVectorMinU16()
2018-05-02 17:13:41 +01:00
Lioncash
7cab5c2e03
emit_x64_vector: Vectorize fallback path in EmitVectorMinS32()
2018-05-02 17:13:41 +01:00
Lioncash
7fa0d2765a
emit_x64_vector: Vectorize fallback path in EmitVectorMinS8()
2018-05-02 17:13:41 +01:00
Lioncash
73477f04dd
emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
...
This can simply be merged with the previous one.
2018-05-01 19:31:48 +01:00
Lioncash
f90a476546
emit_x64_vector: Avoid left shift of negative value in LogicalVShift
...
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2018-05-01 19:31:48 +01:00
Lioncash
3e3ce37eb8
backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
...
Also adds IR opcodes to dispatch said variants
2018-04-30 22:41:17 +01:00
Lioncash
5e20e5a44a
emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16()
2018-04-28 19:19:06 +01:00
Lioncash
f8a96dbe5f
emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32()
2018-04-28 19:17:18 +01:00
Lioncash
43a3873dd2
emit_x64_vector: Use VBPROADCAST where applicable and available
...
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2018-04-26 22:28:09 +01:00
Lioncash
9c2550ac72
ir: Add opcodes for performing vector deinterleaving
2018-04-26 08:49:51 +01:00
Lioncash
e96bfd1133
emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs
2018-04-20 23:34:32 +01:00
Lioncash
f85e2681fc
emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs
2018-04-20 23:34:32 +01:00
Lioncash
2fb3ddd2cd
emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
...
Provides a better alternative to a fallback operation.
2018-04-20 23:34:32 +01:00
Lioncash
be48d44132
emit_x64_vector: Get rid of some magic numbers in loop bounds
2018-04-20 18:54:29 +01:00
Lioncash
a3c8ab61ea
emit_x64_vector: Generify variable shift functions
2018-04-20 18:54:14 +01:00
Lioncash
f921005a70
ir: Add opcode for reversing bits in a vector
2018-04-03 21:18:03 +01:00
Lioncash
032b09cbdf
ir: Add opcodes for performing vector absolute values
2018-04-03 07:49:08 +01:00
MerryMage
14d3d72aac
IR: Implement VectorExtract, VectorExtractLower IR instructions
2018-04-02 21:52:46 +01:00
Lioncash
734447ef3d
ir: Add opcodes for performing vector unsigned absolute differences
2018-04-02 19:08:20 +01:00
Lioncash
1e8fe95cc4
IR: Add opcodes for interleaving upper-order bytes/halfwords/words/doublewords
...
I should have added this when I introduced the functions for interleaving
low-order equivalents for consistency in the interface.
2018-03-31 11:01:38 +01:00
Lioncash
6278f83560
emit_x64_vector: Fix typo in VectorShuffleImpl
...
This is supposed to be pshufd, not pshufw (which only allows a 64-bit operand)
2018-03-23 19:51:09 +00:00
Lioncash
f62a258945
ir: Add IR opcodes for emitting vector shuffles
...
This uses the ARM terminology for sizes (Halfword -> 2 bytes, Word -> 4 bytes)
as opposed to the x86 terminology of (Word -> 2 bytes, Double word -> 4 bytes)
2018-03-21 15:40:03 +00:00
MerryMage
7673933a9b
A64: Implement USHL
2018-02-20 19:48:15 +00:00
MerryMage
3184edf4a9
IR: Add IR instruction ZeroVector
2018-02-20 15:41:07 +00:00
MerryMage
616056d9a3
constant_pool: Add frame parameter
2018-02-20 14:04:48 +00:00
MerryMage
2880eb3da1
IR: Implement Vector{Max,Min}{Signed,Unsigned}
2018-02-13 17:56:46 +00:00
MerryMage
6d4f14e876
IR: Implement VectorGreaterSigned
2018-02-13 15:47:52 +00:00
MerryMage
a7e4202828
IR: Implement VectorSignExtend
2018-02-11 16:24:33 +00:00