Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMD Threadripper Pro based on AMD Zen4 #625

Merged
merged 4 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions groups/zen4/CACHE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ FIXC2 MAX_CPU_CLOCK
PMC0 RETIRED_INSTRUCTIONS
PMC1 CPU_CLOCKS_UNHALTED
PMC2 DATA_CACHE_ACCESSES
PMC3 DATA_CACHE_REFILLS_ALL
PMC3 ANY_DATA_CACHE_FILLS_ALL

METRICS
Runtime (RDTSC) [s] time
Expand All @@ -23,9 +23,9 @@ LONG
Formulas:
data cache requests = DATA_CACHE_ACCESSES
data cache request rate = DATA_CACHE_ACCESSES / RETIRED_INSTRUCTIONS
data cache misses = DATA_CACHE_REFILLS_ALL
data cache miss rate = DATA_CACHE_REFILLS_ALL / RETIRED_INSTRUCTIONS
data cache miss ratio = DATA_CACHE_REFILLS_ALL / DATA_CACHE_ACCESSES
data cache misses = ANY_DATA_CACHE_FILLS_ALL
data cache miss rate = ANY_DATA_CACHE_FILLS_ALL / RETIRED_INSTRUCTIONS
data cache miss ratio = ANY_DATA_CACHE_FILLS_ALL / DATA_CACHE_ACCESSES
-
This group measures the locality of your data accesses with regard to the
L1 cache. Data cache request rate tells you how data intensive your code is
Expand Down
19 changes: 7 additions & 12 deletions groups/zen4/L3CACHE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,21 @@ SHORT L3 cache miss rate/ratio (experimental)
EVENTSET
PMC0 RETIRED_INSTRUCTIONS
PMC1 CPU_CLOCKS_UNHALTED
CPMC0 L3_CACHE_REQ
CPMC1 L3_MISS_REQ
CPMC2 L3_CACHE_REQ_MISS
CPMC0 L3_LOOKUP_STATE_ALL_TYPES
CPMC1 L3_LOOKUP_STATE_MISS

METRICS
Runtime (RDTSC) [s] time
CPI FIXC1/FIXC0
L3 request rate CPMC0/PMC0
L3 miss rate CPMC2/PMC0
L3 miss ratio CPMC2/CPMC0
L3 miss rate CPMC1/PMC0
L3 miss ratio CPMC1/CPMC0

LONG
Formulas:
L3 request rate = L3_CACHE_REQ/RETIRED_INSTRUCTIONS
L3 miss rate = L3_CACHE_REQ_MISS/RETIRED_INSTRUCTIONS
L3 miss ratio = L3_CACHE_REQ_MISS/L3_CACHE_REQ
L3 request rate = L3_LOOKUP_STATE_ALL_TYPES/RETIRED_INSTRUCTIONS
L3 miss rate = L3_LOOKUP_STATE_MISS/RETIRED_INSTRUCTIONS
L3 miss ratio = L3_LOOKUP_STATE_MISS/L3_LOOKUP_STATE_ALL_TYPES
-
This group measures the locality of your data accesses with regard to the
L3 cache. L3 request rate tells you how data intensive your code is
Expand All @@ -28,7 +27,3 @@ cache lines from memory. And finally L3 miss ratio tells you how many of your
memory references required a cache line to be loaded from a higher level.
While the data cache miss rate might be given by your algorithm you should
try to get data cache miss ratio as low as possible by increasing your cache reuse.
AMD defines two events for L3 cache misses:
- The performance metrics table uses L3_CACHE_REQ_MISS (0x0300C00000400104)
- The official event for L3 misses is L3_MISS_REQ (0x0300C00000401F9a)

23 changes: 11 additions & 12 deletions groups/zen4/NUMA.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
SHORT L2 cache bandwidth in MBytes/s (experimental)
SHORT Socket interconnect and NUMA traffic

EVENTSET
FIXC1 ACTUAL_CPU_CLOCK
FIXC2 MAX_CPU_CLOCK
PMC0 DATA_CACHE_REFILLS_LOCAL_ALL
PMC1 DATA_CACHE_REFILLS_REMOTE_ALL
PMC0 ANY_DATA_CACHE_FILLS_LOCAL_ALL
PMC1 ANY_DATA_CACHE_FILLS_REMOTE_ALL
PMC2 HWPREF_DATA_CACHE_FILLS_LOCAL_ALL
PMC3 HWPREF_DATA_CACHE_FILLS_REMOTE_ALL
PMC3 HWPREF_DATA_CACHE_FILLS_REMOTE_DRAM

METRICS
Runtime (RDTSC) [s] time
Expand All @@ -22,14 +22,13 @@ Total data volume [GBytes] 1.0E-09*(PMC0+PMC2+PMC1+PMC3)*64.0

LONG
Formulas:
Local bandwidth [MBytes/s] = 1.0E-06*(DATA_CACHE_REFILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL)*64.0/time
Local data volume [GBytes] = 1.0E-09*(DATA_CACHE_REFILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL)*64.0
Remote bandwidth [MBytes/s] = 1.0E-06*(DATA_CACHE_REFILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0/time
Remote data volume [GBytes] = 1.0E-09*(DATA_CACHE_REFILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0
Total bandwidth [MBytes/s] = 1.0E-06*(DATA_CACHE_REFILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL+DATA_CACHE_REFILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0/time
Total data volume [GBytes] = 1.0E-09*(DATA_CACHE_REFILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL+DATA_CACHE_REFILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0
Local bandwidth [MBytes/s] = 1.0E-06*(ANY_DATA_CACHE_FILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL)*64.0/time
Local data volume [GBytes] = 1.0E-09*(ANY_DATA_CACHE_FILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL)*64.0
Remote bandwidth [MBytes/s] = 1.0E-06*(ANY_DATA_CACHE_FILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0/time
Remote data volume [GBytes] = 1.0E-09*(ANY_DATA_CACHE_FILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0
Total bandwidth [MBytes/s] = 1.0E-06*(ANY_DATA_CACHE_FILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL+ANY_DATA_CACHE_FILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0/time
Total data volume [GBytes] = 1.0E-09*(ANY_DATA_CACHE_FILLS_LOCAL_ALL+HWPREF_DATA_CACHE_FILLS_LOCAL_ALL+ANY_DATA_CACHE_FILLS_REMOTE_ALL+HWPREF_DATA_CACHE_FILLS_REMOTE_ALL)*64.0
-
Profiling group to measure NUMA traffic. The data sources range from
local L2, CCX and memory for the local metrics and remote CCX and memory
for the remote metrics. There are also events that measure the software
prefetches from local and remote domain but AMD Zen provides only 4 counters.
for the remote metrics.
1 change: 1 addition & 0 deletions src/access-daemon/accessDaemon.c
Original file line number Diff line number Diff line change
Expand Up @@ -3752,6 +3752,7 @@ int main(void)
case ZEN4_RYZEN:
case ZEN4_RYZEN2:
case ZEN4_EPYC:
case ZEN4_RYZEN_PRO:
allowed = allowed_amd19_zen4;
break;
default:
Expand Down
5 changes: 3 additions & 2 deletions src/includes/perfmon_perfevent.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ int perfmon_init_perfevent(int cpu_id)
active_cpus += 1;
}
perf_event_num_cpus = cpuid_topology.numHWThreads;
if (cpuid_info.family == ZEN3_FAMILY && (cpuid_info.model == ZEN4_RYZEN || cpuid_info.model == ZEN4_RYZEN2 || cpuid_info.model == ZEN4_EPYC))
if (cpuid_info.family == ZEN3_FAMILY && (cpuid_info.model == ZEN4_RYZEN || cpuid_info.model == ZEN4_RYZEN2 || cpuid_info.model == ZEN4_RYZEN_PRO || cpuid_info.model == ZEN4_EPYC ))
{
perfEventOptionNames[EVENT_OPTION_TID] = "threadmask";
perfEventOptionNames[EVENT_OPTION_CID] = "coreid";
Expand Down Expand Up @@ -901,7 +901,8 @@ int perf_uncore_setup(struct perf_event_attr *attr, RegisterType type, PerfmonEv
}
}
}
if (type != POWER && cpuid_info.family == ZEN3_FAMILY && (cpuid_info.model == ZEN4_RYZEN || cpuid_info.model == ZEN4_RYZEN2 || cpuid_info.model == ZEN4_EPYC))

if (type != POWER && cpuid_info.family == ZEN3_FAMILY && (cpuid_info.model == ZEN4_RYZEN || cpuid_info.model == ZEN4_RYZEN2 || cpuid_info.model == ZEN4_RYZEN_PRO || cpuid_info.model == ZEN4_EPYC))
{
int got_cid = 0;
int got_slices = 0;
Expand Down
1 change: 1 addition & 0 deletions src/includes/sysFeatures_amd.h
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,7 @@ static _SysFeatureList* amd_k19_cpu_feature_inputs[] = {

static _HWArchFeatures amd_arch_features[] = {
{ZEN3_FAMILY, ZEN4_RYZEN, amd_k19_cpu_feature_inputs},
{ZEN3_FAMILY, ZEN4_RYZEN_PRO, amd_k19_cpu_feature_inputs},
{ZEN3_FAMILY, ZEN4_EPYC, amd_k19_cpu_feature_inputs},
{-1, -1, NULL},
};
Expand Down
1 change: 1 addition & 0 deletions src/includes/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ struct topology_functions {
#define ZEN3_RYZEN3 0x50
#define ZEN3_EPYC_TRENTO 0x30
#define ZEN4_RYZEN 0x61
#define ZEN4_RYZEN_PRO 0x08
#define ZEN4_RYZEN2 0x74
#define ZEN4_EPYC 0x11

Expand Down
2 changes: 2 additions & 0 deletions src/perfmon.c
Original file line number Diff line number Diff line change
Expand Up @@ -1290,6 +1290,7 @@ perfmon_init_maps(void)
case ZEN4_RYZEN:
case ZEN4_RYZEN2:
case ZEN4_EPYC:
case ZEN4_RYZEN_PRO:
eventHash = zen4_arch_events;
perfmon_numArchEvents = perfmon_numArchEventsZen4;
counter_map = zen4_counter_map;
Expand Down Expand Up @@ -1953,6 +1954,7 @@ perfmon_init_funcs(int* init_power, int* init_temp)
case ZEN4_RYZEN:
case ZEN4_RYZEN2:
case ZEN4_EPYC:
case ZEN4_RYZEN_PRO:
initThreadArch = perfmon_init_zen4;
initialize_power = TRUE;
perfmon_startCountersThread = perfmon_startCountersThread_zen4;
Expand Down
1 change: 1 addition & 0 deletions src/power.c
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ power_init(int cpuId)
case ZEN4_RYZEN:
case ZEN4_RYZEN2:
case ZEN4_EPYC:
case ZEN4_RYZEN_PRO:
cpuid_info.turbo = 0;
power_info.hasRAPL = 1;
power_info.statusRegWidth = 64;
Expand Down
2 changes: 1 addition & 1 deletion src/sysFeatures_amd_rapl.c
Original file line number Diff line number Diff line change
Expand Up @@ -910,7 +910,7 @@ int amd_rapl_l3_test()
}
topo = get_cpuTopology();
info = get_cpuInfo();
if (info->family == ZEN3_FAMILY && (info->model == ZEN4_RYZEN || info->model == ZEN4_EPYC))
if (info->family == ZEN3_FAMILY && (info->model == ZEN4_RYZEN || info->model == ZEN4_RYZEN_PRO || info->model == ZEN4_EPYC))
{
for (int i = 0; i < topo->numSockets; i++)
{
Expand Down
1 change: 1 addition & 0 deletions src/topology.c
Original file line number Diff line number Diff line change
Expand Up @@ -1144,6 +1144,7 @@ topology_setName(void)
case ZEN4_RYZEN:
case ZEN4_RYZEN2:
case ZEN4_EPYC:
case ZEN4_RYZEN_PRO:
cpuid_info.name = amd_zen4_str;
cpuid_info.short_name = short_zen4;
break;
Expand Down
Loading