MerryMage
14dd45eed9
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2018-09-23 10:50:39 +01:00
Lioncash
f69893345f
ir: Add opcodes for unsigned saturating left shifts
2018-09-19 12:13:22 +01:00
MerryMage
1ec40ef6ed
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2018-09-18 21:46:17 +01:00
MerryMage
6513595c09
opcodes.inc: Align columns to a tabstop of 4
2018-09-18 20:37:03 +01:00
MerryMage
6b0d2b529e
IR: Add fbits argument to FixedToFP-related opcodes
2018-09-18 20:36:37 +01:00
Lioncash
532762582b
ir: Add opcodes for left signed saturated shifts
2018-09-18 18:22:03 +01:00
MerryMage
4a2c5962c7
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2018-09-15 13:38:17 +01:00
MerryMage
59dc33ef12
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2018-09-15 09:55:25 +01:00
MerryMage
bbaebeb217
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2018-09-15 09:55:25 +01:00
Lioncash
0bb908fb53
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2018-09-10 18:30:40 +01:00
Lioncash
28424c7ad1
ir: Add opcodes form unsigned saturated accumulations of signed values
2018-09-09 17:06:03 +01:00
Lioncash
4c0adbb7f1
ir: Add opcodes for signed saturated accumulations of unsigned values
2018-09-09 17:06:03 +01:00
Lioncash
4f3bde5f12
ir: Add opcodes for performing unsigned reciprocal square root estimates
2018-09-09 00:37:28 +01:00
Lioncash
622b60efd6
ir: Add opcodes for unsigned reciprocal estimate
2018-09-09 00:37:28 +01:00
Lioncash
06062a91c5
A64: Add opcodes for signed saturating negations
2018-09-08 11:23:32 +01:00
Lioncash
af8bea59d5
ir: Add opcodes for scalar signed saturated doubling multiplies
2018-09-06 20:35:43 +01:00
Lioncash
72eb6ad362
ir: Add opcodes for signed saturated doubling multiplies
2018-09-06 20:35:43 +01:00
Lioncash
f978c445fa
ir: Add opcodes for signed saturated absolute values
2018-09-06 15:49:25 +01:00
MerryMage
8067ab9553
IR: Add VectorTable and VectorTableLookup IR instructions
2018-08-18 21:59:44 +01:00
MerryMage
0e0e839ba0
opcodes: Cleanup opcodes table
...
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2018-08-18 19:39:59 +01:00
Lioncash
a278775c43
ir: Add opcodes for performing rounding left shifts
2018-08-18 14:23:29 +01:00
Lioncash
fc96d512c9
A64: Implement ISB
...
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2018-08-18 13:30:54 +01:00
MerryMage
6d236d459f
system: Implement MRS CNTFRQ_EL0
2018-08-16 09:58:34 +01:00
Lioncash
18a8151684
ir: Add opcodes for unsigned saturating add and subtract
2018-08-14 08:48:06 +01:00
MerryMage
8f4777338e
IR: Implement FPMulX IR instruction
2018-08-02 14:11:14 +01:00
MerryMage
8f46c26d26
IR: Initial implementation of FPVectorRoundInt
2018-07-30 13:31:51 +01:00
MerryMage
ce58863903
IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths
2018-07-30 11:01:36 +01:00
Lioncash
1dfb29fc14
ir: Add opcodes for vector paired maximum and minimums
...
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2018-07-30 08:40:32 +01:00
Lioncash
aae22eec26
ir: Add opcodes for performing scalar integral min/max
2018-07-30 08:39:33 +01:00
Lioncash
6ef3af3bc9
A64: Implement PMULL{2}
2018-07-29 10:04:58 +01:00
Lioncash
656a4042a2
ir: Add opcode for performing polynomial multiplication
2018-07-26 16:16:30 +01:00
MerryMage
0f9bc2d391
IR: Implement FPVectorTo{Signed,Unsigned}Fixed
2018-07-26 12:48:36 +01:00
MerryMage
76f0ca04d6
IR: Implement FPVector{Max,Min}
2018-07-26 09:31:56 +01:00
MerryMage
2d2ca5ebc1
IR: Implement FPRecipStepFused, FPVectorRecipStepFused
2018-07-25 19:14:23 +01:00
MerryMage
c5a14ab21b
IR: Implement FPVectorRecipEstimate
2018-07-25 18:55:40 +01:00
MerryMage
186e52ca50
IR: Implement FPRecipEstimate
2018-07-25 18:36:40 +01:00
MerryMage
b1e3616de2
IR: Implement FPVectorNeg
2018-07-25 13:25:35 +01:00
MerryMage
93eeb25fac
IR: Implement FPVectorMulAdd
2018-07-25 13:19:48 +01:00
MerryMage
ff025e88d0
IR: Implement A64OrQC
2018-07-24 19:04:40 +01:00
MerryMage
759289ec5c
A64: Implement UQXTN (vector)
2018-07-24 18:31:32 +01:00
MerryMage
0682353626
A64: Implement SQXTN (vector)
2018-07-24 17:59:14 +01:00
MerryMage
d9b59c69de
A64: Implement SQXTUN
2018-07-24 16:32:10 +01:00
MerryMage
f7052ae04d
A64: Implement FRSQRTS (vector), single/double variant
2018-07-23 22:58:52 +01:00
MerryMage
0925ef6248
A64: Implement FRSQRTE (vector), single/double variant
2018-07-23 22:46:12 +01:00
MerryMage
4ef864e81c
IR: Implement FPRSqrtStepFused
2018-07-23 22:05:17 +01:00
MerryMage
7ed089fd8e
IR: Implement FPRSqrtEstimate
2018-07-22 18:35:43 +01:00
MerryMage
39958434b6
A64: Implement FABD in terms of existing IR instructions
...
Fixes NaN issue. Closes #306 .
2018-07-16 16:51:16 +01:00
MerryMage
48166d80cd
IR: Implement FPRoundInt
2018-07-16 14:10:53 +01:00
MerryMage
59e78dc57e
A64: Implement FADDP (vector)
2018-07-15 22:49:58 +01:00
MerryMage
dfdec797e3
A64: Implement SADDLP
2018-07-15 18:50:09 +01:00