207 Commits

Author SHA1 Message Date
MerryMage
14dd45eed9 Fix VShift terminology
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.

- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2018-09-23 10:50:39 +01:00
Lioncash
f69893345f ir: Add opcodes for unsigned saturating left shifts 2018-09-19 12:13:22 +01:00
MerryMage
1ec40ef6ed IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2018-09-18 21:46:17 +01:00
MerryMage
6513595c09 opcodes.inc: Align columns to a tabstop of 4 2018-09-18 20:37:03 +01:00
MerryMage
6b0d2b529e IR: Add fbits argument to FixedToFP-related opcodes 2018-09-18 20:36:37 +01:00
Lioncash
532762582b ir: Add opcodes for left signed saturated shifts 2018-09-18 18:22:03 +01:00
MerryMage
4a2c5962c7 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2018-09-15 13:38:17 +01:00
MerryMage
59dc33ef12 emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2018-09-15 09:55:25 +01:00
MerryMage
bbaebeb217 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2018-09-15 09:55:25 +01:00
Lioncash
0bb908fb53 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2018-09-10 18:30:40 +01:00
Lioncash
28424c7ad1 ir: Add opcodes form unsigned saturated accumulations of signed values 2018-09-09 17:06:03 +01:00
Lioncash
4c0adbb7f1 ir: Add opcodes for signed saturated accumulations of unsigned values 2018-09-09 17:06:03 +01:00
Lioncash
4f3bde5f12 ir: Add opcodes for performing unsigned reciprocal square root estimates 2018-09-09 00:37:28 +01:00
Lioncash
622b60efd6 ir: Add opcodes for unsigned reciprocal estimate 2018-09-09 00:37:28 +01:00
Lioncash
06062a91c5 A64: Add opcodes for signed saturating negations 2018-09-08 11:23:32 +01:00
Lioncash
af8bea59d5 ir: Add opcodes for scalar signed saturated doubling multiplies 2018-09-06 20:35:43 +01:00
Lioncash
72eb6ad362 ir: Add opcodes for signed saturated doubling multiplies 2018-09-06 20:35:43 +01:00
Lioncash
f978c445fa ir: Add opcodes for signed saturated absolute values 2018-09-06 15:49:25 +01:00
MerryMage
8067ab9553 IR: Add VectorTable and VectorTableLookup IR instructions 2018-08-18 21:59:44 +01:00
MerryMage
0e0e839ba0 opcodes: Cleanup opcodes table
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2018-08-18 19:39:59 +01:00
Lioncash
a278775c43 ir: Add opcodes for performing rounding left shifts 2018-08-18 14:23:29 +01:00
Lioncash
fc96d512c9 A64: Implement ISB
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2018-08-18 13:30:54 +01:00
MerryMage
6d236d459f system: Implement MRS CNTFRQ_EL0 2018-08-16 09:58:34 +01:00
Lioncash
18a8151684 ir: Add opcodes for unsigned saturating add and subtract 2018-08-14 08:48:06 +01:00
MerryMage
8f4777338e IR: Implement FPMulX IR instruction 2018-08-02 14:11:14 +01:00
MerryMage
8f46c26d26 IR: Initial implementation of FPVectorRoundInt 2018-07-30 13:31:51 +01:00
MerryMage
ce58863903 IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths 2018-07-30 11:01:36 +01:00
Lioncash
1dfb29fc14 ir: Add opcodes for vector paired maximum and minimums
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2018-07-30 08:40:32 +01:00
Lioncash
aae22eec26 ir: Add opcodes for performing scalar integral min/max 2018-07-30 08:39:33 +01:00
Lioncash
6ef3af3bc9 A64: Implement PMULL{2} 2018-07-29 10:04:58 +01:00
Lioncash
656a4042a2 ir: Add opcode for performing polynomial multiplication 2018-07-26 16:16:30 +01:00
MerryMage
0f9bc2d391 IR: Implement FPVectorTo{Signed,Unsigned}Fixed 2018-07-26 12:48:36 +01:00
MerryMage
76f0ca04d6 IR: Implement FPVector{Max,Min} 2018-07-26 09:31:56 +01:00
MerryMage
2d2ca5ebc1 IR: Implement FPRecipStepFused, FPVectorRecipStepFused 2018-07-25 19:14:23 +01:00
MerryMage
c5a14ab21b IR: Implement FPVectorRecipEstimate 2018-07-25 18:55:40 +01:00
MerryMage
186e52ca50 IR: Implement FPRecipEstimate 2018-07-25 18:36:40 +01:00
MerryMage
b1e3616de2 IR: Implement FPVectorNeg 2018-07-25 13:25:35 +01:00
MerryMage
93eeb25fac IR: Implement FPVectorMulAdd 2018-07-25 13:19:48 +01:00
MerryMage
ff025e88d0 IR: Implement A64OrQC 2018-07-24 19:04:40 +01:00
MerryMage
759289ec5c A64: Implement UQXTN (vector) 2018-07-24 18:31:32 +01:00
MerryMage
0682353626 A64: Implement SQXTN (vector) 2018-07-24 17:59:14 +01:00
MerryMage
d9b59c69de A64: Implement SQXTUN 2018-07-24 16:32:10 +01:00
MerryMage
f7052ae04d A64: Implement FRSQRTS (vector), single/double variant 2018-07-23 22:58:52 +01:00
MerryMage
0925ef6248 A64: Implement FRSQRTE (vector), single/double variant 2018-07-23 22:46:12 +01:00
MerryMage
4ef864e81c IR: Implement FPRSqrtStepFused 2018-07-23 22:05:17 +01:00
MerryMage
7ed089fd8e IR: Implement FPRSqrtEstimate 2018-07-22 18:35:43 +01:00
MerryMage
39958434b6 A64: Implement FABD in terms of existing IR instructions
Fixes NaN issue. Closes #306.
2018-07-16 16:51:16 +01:00
MerryMage
48166d80cd IR: Implement FPRoundInt 2018-07-16 14:10:53 +01:00
MerryMage
59e78dc57e A64: Implement FADDP (vector) 2018-07-15 22:49:58 +01:00
MerryMage
dfdec797e3 A64: Implement SADDLP 2018-07-15 18:50:09 +01:00