Lioncash
6c877ff8db
emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
...
Given this is just an internal helper function, it can be marked static.
2018-09-16 08:16:54 +01:00
Lioncash
4b5926dcab
perf_map: Use std::string_view instead of std::string for PerfMapRegister()
...
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2018-09-16 08:16:43 +01:00
MerryMage
74459479b9
A64: Implement SQRDMULH (vector), vector variant
2018-09-15 14:04:42 +01:00
MerryMage
03b80f2ebe
A64: Implement SQDMULL (vector), vector variant
2018-09-15 13:38:37 +01:00
MerryMage
4a2c5962c7
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2018-09-15 13:38:17 +01:00
MerryMage
59dc33ef12
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2018-09-15 09:55:25 +01:00
MerryMage
bbaebeb217
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2018-09-15 09:55:25 +01:00
Lioncash
baac5a8810
backend_x64/a64_interface: Re-enable the constant folding pass
...
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2018-09-14 12:14:19 +01:00
MerryMage
e78ca1947b
emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused
2018-09-12 21:01:06 +01:00
MerryMage
8a5ae9a366
emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused
2018-09-12 20:45:39 +01:00
MerryMage
39818f98e8
emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused
2018-09-12 16:10:18 +01:00
MerryMage
3d0a0b432b
emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}
2018-09-12 14:58:09 +01:00
MerryMage
2293dff6d8
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2018-09-11 19:57:31 +01:00
Lioncash
2047683777
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2018-09-11 07:08:32 +01:00
MerryMage
55e9e401aa
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2018-09-10 22:39:30 +01:00
MerryMage
1076651426
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2018-09-10 21:24:07 +01:00
Lioncash
4039030234
A64: Implement CLZ's vector variant
2018-09-10 18:30:40 +01:00
Lioncash
0bb908fb53
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2018-09-10 18:30:40 +01:00
MerryMage
3b13259630
A64/translate: VectorZeroUpper for V(64) stores
...
Ensures correctness.
2018-09-09 19:59:02 +01:00
MerryMage
1931d44495
simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper
2018-09-09 19:55:37 +01:00
Lioncash
a0790f02d0
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2018-09-09 19:30:14 +01:00
Lioncash
b0e1eb5a15
A64: Implement USQADD's scalar and vector variants
2018-09-09 17:06:03 +01:00
Lioncash
28424c7ad1
ir: Add opcodes form unsigned saturated accumulations of signed values
2018-09-09 17:06:03 +01:00
Lioncash
9923ea0b71
A64: Implement SUQADD's scalar and vector variants
2018-09-09 17:06:03 +01:00
Lioncash
4c0adbb7f1
ir: Add opcodes for signed saturated accumulations of unsigned values
2018-09-09 17:06:03 +01:00
Lioncash
799bfed2df
A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
...
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2018-09-09 13:55:40 +01:00
Lioncash
94451ec321
A64: Implement UMULL{2}'s vector by-element variant
2018-09-09 13:55:40 +01:00
Lioncash
45867deac9
A64: Implement SMULL{2}'s vector by-element variant
2018-09-09 13:55:40 +01:00
Lioncash
02357939ac
ir/value: Replace includes with forward declarations
...
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2018-09-09 09:04:22 +01:00
Lioncash
450f721df5
ir/cond: Migrate to C++17 nested namespace specifiers
2018-09-09 09:03:42 +01:00
Lioncash
e649988cd6
CMakeLists: Add missing cond.h header to file listing
...
Allows the file to show up within IDEs more easily.
2018-09-09 09:03:42 +01:00
Lioncash
d20e7694dd
A64: Implement URSQRTE
2018-09-09 00:37:28 +01:00
Lioncash
4f3bde5f12
ir: Add opcodes for performing unsigned reciprocal square root estimates
2018-09-09 00:37:28 +01:00
Lioncash
cfeeaec1c6
A64: Implement URECPE
2018-09-09 00:37:28 +01:00
Lioncash
622b60efd6
ir: Add opcodes for unsigned reciprocal estimate
2018-09-09 00:37:28 +01:00
Lioncash
d17599af40
Update Xbyak to 5.71
...
Merge commit 'f7c26e9f7ace572f440b80b0e71625295755c38b'
2018-09-08 17:09:25 -04:00
Lioncash
f7c26e9f7a
Squashed 'externals/xbyak/' changes from 671fc805..1de435ed
...
1de435ed bf uses Label class
613922bd add Label L() for convenience
43e15583 fix typo
93579ee6 add protect-re.cpp
60004b5c fix url of protect-re.cpp
348b2709 fix typo of doc
f34f6ed5 update manual
232110be update test
82b78bf0 add setProtectMode
dd8b290f put warning message if pageSize != 4096
64775ca2 a little refactoring
7c3e7b85 fix wrong VSIB encoding with idx >= 16
git-subtree-dir: externals/xbyak
git-subtree-split: 1de435ed04c8e74775804da944d176baf0ce56e2
2018-09-08 16:52:55 -04:00
Lioncash
8782b69c93
travis: Make macOS build with Xcode 9.4.1
...
Builds against the latest release version of the Xcode toolchain
2018-09-08 13:21:58 +01:00
Lioncash
b575b23ea9
A64: Implement SQNEG's scalar and vector variant
2018-09-08 11:23:32 +01:00
Lioncash
06062a91c5
A64: Add opcodes for signed saturating negations
2018-09-08 11:23:32 +01:00
Lioncash
1c40579de5
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
...
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2018-09-08 11:23:32 +01:00
Lioncash
e335050886
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
...
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2018-09-08 11:23:32 +01:00
Lioncash
8b13421bac
A64: Implement SQDMULH's by-element scalar variant
2018-09-08 11:23:32 +01:00
Lioncash
9122a6e19e
A64: Implement SQDMULH's by-element vector variant
2018-09-08 11:23:32 +01:00
MerryMage
176e60ebb1
backend/x64: Do not clear fast_dispatch_table if not enabled
...
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2018-09-08 11:23:32 +01:00
MerryMage
959446573f
A64: Implement FastDispatchHint
2018-09-07 22:07:44 +01:00
MerryMage
2be95f2b3b
A32: Implement FastDispatchHint
2018-09-07 22:07:44 +01:00
MerryMage
96f23acd00
ir/terminal: Add FastDispatchHint
2018-09-07 21:29:47 +01:00
Lioncash
f5ca9e9e4a
A64: Implement SQDMULH's scalar variant
2018-09-06 20:35:43 +01:00
Lioncash
af8bea59d5
ir: Add opcodes for scalar signed saturated doubling multiplies
2018-09-06 20:35:43 +01:00