Lioncash
80320bfed0
frontend/ir/ir_emitter: Add A32 equivalent to A64's SetCheckBit
...
This will be used in a subsequent change to implement ARMv6T2's CBZ/CBNZ
Thumb-1 instructions.
2019-05-03 19:13:21 -04:00
Lioncash
f3679e6278
A32: Implement barrier instructions introduced in ARMv7
...
Provides basic implementations of the barrier instruction introduced
within ARMv7. Currently these simply mirror the behavior of the AArch64
equivalents.
2019-04-27 08:29:49 -04:00
Merry
6377fd9866
Merge pull request #482 from lioncash/fixedfp
...
A64: Handle half-precision variants of FP->Fixed instructions
2019-04-15 20:08:01 +01:00
Lioncash
1d3fc42bfe
frontend/ir_emitter: Add half-precision->fixed-point opcodes
2019-04-15 00:55:46 -04:00
Lioncash
793b3b38d4
frontend/ir_emitter: Add half-precision opcode variant of FPVectorRSqrtStepFused
2019-04-14 21:12:54 -04:00
Lioncash
db4d134726
frontend/ir_emitter: Add half-precision opcode variant of FPRSqrtStepFused
2019-04-14 21:12:49 -04:00
Merry
eea732febf
Merge pull request #478 from lioncash/stepfused
...
A64: Handle half-precision variants of FRECPE and FRECPS
2019-04-14 12:40:18 +01:00
Lioncash
c55db96819
frontend/ir_emitter: Add half-precision opcode for FPVectorRecipEstimate
2019-04-14 06:14:19 -04:00
Lioncash
1615ba0adc
frontend/ir_emitter: Add half-precision opcode for FPRecipEstimate
2019-04-14 06:14:19 -04:00
Lioncash
4ae0a27ea4
frontend/ir_emitter: Add half-precision opcode for FPVectorRecipStepFused
2019-04-14 06:14:19 -04:00
Lioncash
065143b395
frontend/ir_emitter: Add half-precision opcode for FPRecipStepFused
2019-04-14 06:14:18 -04:00
Lioncash
5edbb415c5
frontend/ir_emitter: Add half-precision opcode variant for FPVectorRSqrtEstimate
2019-04-14 06:12:20 -04:00
Lioncash
f1e556632c
frontend/ir_emitter: Add half-precision opcode variant for FPRSqrtEstimate
2019-04-14 06:11:45 -04:00
Lioncash
722daae0d4
frontend/ir_emitter: Add half-precision variant of FPVectorRoundInt
2019-04-13 17:49:04 -04:00
Lioncash
8243705134
frontend/ir_emitter: Add half-precision variant of FPRoundInt
2019-04-13 17:44:37 -04:00
Merry
b34f42575d
Merge pull request #475 from lioncash/muladd
...
A64: Enable half-precision variants of floating-point multiply-add instructions
2019-04-13 14:30:03 +01:00
Lioncash
278c7ae744
ir/frontend: Add half-precision opcode for FPVectorMulAdd
2019-04-13 01:42:35 -04:00
Lioncash
13b41525cb
frontend/ir_emitter: Add half-precision opcode for FPMulAdd
2019-04-13 00:18:09 -04:00
Lioncash
31867874de
frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation
2019-04-12 19:44:23 -04:00
Merry
d5263c17cb
Merge pull request #458 from lioncash/float-op
...
A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV
2019-03-24 11:23:21 +00:00
Lioncash
eb09ae27db
frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes
2019-03-23 14:16:44 -04:00
Lioncash
c75f73785d
frontend/ir_emitter: Add half-precision variant of FPAbs
2019-03-23 13:38:09 -04:00
Lioncash
fd71df5efd
frontend/ir_emitter: Add half-precision variant of FPNeg
2019-03-23 13:21:59 -04:00
Lioncash
db67a42244
frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point
2019-03-09 20:08:01 -05:00
Merry
40339b1278
Merge pull request #447 from lioncash/flag
...
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2019-03-07 16:17:13 +00:00
Merry
04f09eb644
Merge pull request #442 from lioncash/fcvtxn
...
A64: Implement scalar and vector variants of FCVTXN
2019-03-06 20:27:59 +00:00
Lioncash
27af30d7c3
ir: Add A64-specific opcodes for getting and setting raw NZCV values
...
This will be necessary to implement the flag manipulation and flag
format instructions.
2019-03-06 14:17:27 -05:00
Lioncash
f44cafe3ca
frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
...
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2019-03-06 12:05:20 -05:00
Lioncash
3e5a52cc66
frontend/ir: Add opcodes for vector square roots
2019-03-04 13:24:36 -05:00
Lioncash
7b004e7230
frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents
2019-03-02 23:31:30 -05:00
MerryMage
333d3b734d
IR: Implement FPVectorMulX
2018-11-17 21:31:22 +00:00
MerryMage
14dd45eed9
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2018-09-23 10:50:39 +01:00
Lioncash
f69893345f
ir: Add opcodes for unsigned saturating left shifts
2018-09-19 12:13:22 +01:00
MerryMage
1ec40ef6ed
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2018-09-18 21:46:17 +01:00
MerryMage
6513595c09
opcodes.inc: Align columns to a tabstop of 4
2018-09-18 20:37:03 +01:00
MerryMage
6b0d2b529e
IR: Add fbits argument to FixedToFP-related opcodes
2018-09-18 20:36:37 +01:00
Lioncash
532762582b
ir: Add opcodes for left signed saturated shifts
2018-09-18 18:22:03 +01:00
MerryMage
4a2c5962c7
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2018-09-15 13:38:17 +01:00
MerryMage
59dc33ef12
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2018-09-15 09:55:25 +01:00
MerryMage
bbaebeb217
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2018-09-15 09:55:25 +01:00
Lioncash
0bb908fb53
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2018-09-10 18:30:40 +01:00
Lioncash
28424c7ad1
ir: Add opcodes form unsigned saturated accumulations of signed values
2018-09-09 17:06:03 +01:00
Lioncash
4c0adbb7f1
ir: Add opcodes for signed saturated accumulations of unsigned values
2018-09-09 17:06:03 +01:00
Lioncash
4f3bde5f12
ir: Add opcodes for performing unsigned reciprocal square root estimates
2018-09-09 00:37:28 +01:00
Lioncash
622b60efd6
ir: Add opcodes for unsigned reciprocal estimate
2018-09-09 00:37:28 +01:00
Lioncash
06062a91c5
A64: Add opcodes for signed saturating negations
2018-09-08 11:23:32 +01:00
Lioncash
af8bea59d5
ir: Add opcodes for scalar signed saturated doubling multiplies
2018-09-06 20:35:43 +01:00
Lioncash
72eb6ad362
ir: Add opcodes for signed saturated doubling multiplies
2018-09-06 20:35:43 +01:00
Lioncash
f978c445fa
ir: Add opcodes for signed saturated absolute values
2018-09-06 15:49:25 +01:00
MerryMage
8067ab9553
IR: Add VectorTable and VectorTableLookup IR instructions
2018-08-18 21:59:44 +01:00