Skip to content

Commit

Permalink
[trace][intel pt] Simple detection of infinite decoding loops
Browse files Browse the repository at this point in the history
The low-level decoder might fall into an infinite decoding loop for
various reasons, the simplest being an infinite direct loop reached due
to wrong handling of self-modified code in the kernel, e.g. it might
reach

```
0x0A: pause
0x0C: jump to 0x0A
```

In this case, all the code is sequential and requires no packets to be
decoded. The low-level decoder would produce an output like the
following

```
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
0x0A: pause
0x0C: jump to 0x0A
... infinite amount of times
```

These cases require stopping the decoder to avoid infinite work and signal this
at least as a trace error.

- Add a check that breaks decoding of a single PSB once 500k instructions have been decoded since the last packet was processed.
- Add a check that looks for infinite loops after certain amount of instructions have been decoded since the last packet was processed.
- Add some `settings` properties for tweaking the thresholds of the checks above. This is also nice because it does the basic work needed for future settings.
- Add an AnomalyDetector class that inspects the DecodedThread and the libipt decoder in search for anomalies. These anomalies are then signaled as fatal errors in the trace.
- Add an ErrorStats class that keeps track of all the errors in a DecodedThread, with a special counter for fatal errors.
- Add an entry for decoded thread errors in the `dump info` command.

Some notes are added in the code and in the documention of the settings,
so please read them.

Besides that, I haven't been unable to create a test case in LLVM style, but
I've found an anomaly in the thread #12 of the trace
72533820-3eb8-4465-b8e4-4e6bf0ccca99 at Meta. We have to figure out how to
artificially create traces with this kind of anomalies in LLVM style.

With this change, that anomalous thread now shows:

```
(lldb)thread trace dump instructions 12 -e -i 23101

thread #12: tid = 8
    ...missing instructions
    23101: (error) anomalous trace: possible infinite loop detected of size 2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23100: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23099: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23098: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23097: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 5 [inlined] rep_nop at processor.h:13:2
    23096: 0xffffffff81342785    pause
  vmlinux-5.12.0-0_fbk8_clang_6656_gc85768aa64da`panic_smp_self_stop + 7 at panic.c:87:2
    23095: 0xffffffff81342787    jmp    0xffffffff81342785        ; <+5> [inlined] rep_nop at processor.h:13:2
```

It used to be in an infinite loop where the decoder never stopped.

Besides that, the dump info command shows

```
(lldb) thread trace dump info 12

 Errors:
    Number of individual errors: 32
      Number of fatal errors: 1
      Number of other errors: 31
```

and in json format

```
(lldb) thread trace dump info 12 -j

 "errors": {
      "totalCount": 32,
      "libiptErrors": {},
      "fatalErrors": 1,
      "otherErrors": 31
    }
```

Differential Revision: https://reviews.llvm.org/D136557
  • Loading branch information
walter-erquinigo committed Oct 25, 2022
1 parent c34de60 commit c49d14a
Show file tree
Hide file tree
Showing 11 changed files with 453 additions and 38 deletions.
7 changes: 6 additions & 1 deletion lldb/include/lldb/Core/PluginManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,8 @@ class PluginManager {
llvm::StringRef name, llvm::StringRef description,
TraceCreateInstanceFromBundle create_callback_from_bundle,
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
llvm::StringRef schema);
llvm::StringRef schema,
DebuggerInitializeCallback debugger_init_callback);

static bool
UnregisterPlugin(TraceCreateInstanceFromBundle create_callback);
Expand Down Expand Up @@ -487,6 +488,10 @@ class PluginManager {
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
ConstString description, bool is_global_property);

static bool CreateSettingForTracePlugin(
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
ConstString description, bool is_global_property);

static lldb::OptionValuePropertiesSP
GetSettingForObjectFilePlugin(Debugger &debugger, ConstString setting_name);

Expand Down
18 changes: 14 additions & 4 deletions lldb/source/Core/PluginManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1051,9 +1051,10 @@ struct TraceInstance
llvm::StringRef name, llvm::StringRef description,
CallbackType create_callback_from_bundle,
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
llvm::StringRef schema)
llvm::StringRef schema, DebuggerInitializeCallback debugger_init_callback)
: PluginInstance<TraceCreateInstanceFromBundle>(
name, description, create_callback_from_bundle),
name, description, create_callback_from_bundle,
debugger_init_callback),
schema(schema),
create_callback_for_live_process(create_callback_for_live_process) {}

Expand All @@ -1072,10 +1073,10 @@ bool PluginManager::RegisterPlugin(
llvm::StringRef name, llvm::StringRef description,
TraceCreateInstanceFromBundle create_callback_from_bundle,
TraceCreateInstanceForLiveProcess create_callback_for_live_process,
llvm::StringRef schema) {
llvm::StringRef schema, DebuggerInitializeCallback debugger_init_callback) {
return GetTracePluginInstances().RegisterPlugin(
name, description, create_callback_from_bundle,
create_callback_for_live_process, schema);
create_callback_for_live_process, schema, debugger_init_callback);
}

bool PluginManager::UnregisterPlugin(
Expand Down Expand Up @@ -1506,6 +1507,7 @@ CreateSettingForPlugin(Debugger &debugger, ConstString plugin_type_name,
static const char *kDynamicLoaderPluginName("dynamic-loader");
static const char *kPlatformPluginName("platform");
static const char *kProcessPluginName("process");
static const char *kTracePluginName("trace");
static const char *kObjectFilePluginName("object-file");
static const char *kSymbolFilePluginName("symbol-file");
static const char *kJITLoaderPluginName("jit-loader");
Expand Down Expand Up @@ -1559,6 +1561,14 @@ bool PluginManager::CreateSettingForProcessPlugin(
properties_sp, description, is_global_property);
}

bool PluginManager::CreateSettingForTracePlugin(
Debugger &debugger, const lldb::OptionValuePropertiesSP &properties_sp,
ConstString description, bool is_global_property) {
return CreateSettingForPlugin(debugger, ConstString(kTracePluginName),
ConstString("Settings for trace plug-ins"),
properties_sp, description, is_global_property);
}

lldb::OptionValuePropertiesSP
PluginManager::GetSettingForObjectFilePlugin(Debugger &debugger,
ConstString setting_name) {
Expand Down
13 changes: 12 additions & 1 deletion lldb/source/Plugins/Trace/intel-pt/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ lldb_tablegen(TraceIntelPTCommandOptions.inc -gen-lldb-option-defs
SOURCE TraceIntelPTOptions.td
TARGET TraceIntelPTOptionsGen)

lldb_tablegen(TraceIntelPTProperties.inc -gen-lldb-property-defs
SOURCE TraceIntelPTProperties.td
TARGET TraceIntelPTPropertiesGen)

lldb_tablegen(TraceIntelPTPropertiesEnum.inc -gen-lldb-property-enum-defs
SOURCE TraceIntelPTProperties.td
TARGET TraceIntelPTPropertiesEnumGen)

add_lldb_library(lldbPluginTraceIntelPT PLUGIN
CommandObjectTraceStartIntelPT.cpp
DecodedThread.cpp
Expand All @@ -38,4 +46,7 @@ add_lldb_library(lldbPluginTraceIntelPT PLUGIN
)


add_dependencies(lldbPluginTraceIntelPT TraceIntelPTOptionsGen)
add_dependencies(lldbPluginTraceIntelPT
TraceIntelPTOptionsGen
TraceIntelPTPropertiesGen
TraceIntelPTPropertiesEnumGen)
37 changes: 31 additions & 6 deletions lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,34 +170,36 @@ DecodedThread::GetNanosecondsRangeByIndex(uint64_t item_index) {
return prev(next_it)->second;
}

uint64_t DecodedThread::GetTotalInstructionCount() const {
return m_insn_count;
}

void DecodedThread::AppendEvent(lldb::TraceEvent event) {
CreateNewTraceItem(lldb::eTraceItemKindEvent).event = event;
m_events_stats.RecordEvent(event);
}

void DecodedThread::AppendInstruction(const pt_insn &insn) {
CreateNewTraceItem(lldb::eTraceItemKindInstruction).load_address = insn.ip;
m_insn_count++;
}

void DecodedThread::AppendError(const IntelPTError &error) {
CreateNewTraceItem(lldb::eTraceItemKindError).error =
ConstString(error.message()).AsCString();
m_error_stats.RecordError(/*fatal=*/false);
}

void DecodedThread::AppendCustomError(StringRef err) {
void DecodedThread::AppendCustomError(StringRef err, bool fatal) {
CreateNewTraceItem(lldb::eTraceItemKindError).error =
ConstString(err).AsCString();
m_error_stats.RecordError(fatal);
}

lldb::TraceEvent DecodedThread::GetEventByIndex(int item_index) const {
return m_item_data[item_index].event;
}

void DecodedThread::LibiptErrorsStats::RecordError(int libipt_error_code) {
libipt_errors_counts[pt_errstr(pt_errcode(libipt_error_code))]++;
total_count++;
}

const DecodedThread::EventsStats &DecodedThread::GetEventsStats() const {
return m_events_stats;
}
Expand All @@ -207,6 +209,29 @@ void DecodedThread::EventsStats::RecordEvent(lldb::TraceEvent event) {
total_count++;
}

uint64_t DecodedThread::ErrorStats::GetTotalCount() const {
uint64_t total = 0;
for (const auto &[kind, count] : libipt_errors)
total += count;

return total + other_errors + fatal_errors;
}

void DecodedThread::ErrorStats::RecordError(bool fatal) {
if (fatal)
fatal_errors++;
else
other_errors++;
}

void DecodedThread::ErrorStats::RecordError(int libipt_error_code) {
libipt_errors[pt_errstr(pt_errcode(libipt_error_code))]++;
}

const DecodedThread::ErrorStats &DecodedThread::GetErrorStats() const {
return m_error_stats;
}

lldb::TraceItemKind
DecodedThread::GetItemKindByIndex(uint64_t item_index) const {
return static_cast<lldb::TraceItemKind>(m_item_kinds[item_index]);
Expand Down
62 changes: 49 additions & 13 deletions lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,6 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
public:
using TSC = uint64_t;

// Struct holding counts for libipts errors;
struct LibiptErrorsStats {
// libipt error -> count
llvm::DenseMap<const char *, int> libipt_errors_counts;
size_t total_count = 0;

void RecordError(int libipt_error_code);
};

/// A structure that represents a maximal range of trace items associated to
/// the same TSC value.
struct TSCRange {
Expand Down Expand Up @@ -125,16 +116,38 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
bool InRange(uint64_t item_index) const;
};

// Struct holding counts for events;
// Struct holding counts for events
struct EventsStats {
/// A count for each individual event kind. We use an unordered map instead
/// of a DenseMap because DenseMap can't understand enums.
std::unordered_map<lldb::TraceEvent, size_t> events_counts;
size_t total_count = 0;
///
/// Note: We can't use DenseMap because lldb::TraceEvent is not
/// automatically handled correctly by DenseMap. We'd need to implement a
/// custom DenseMapInfo struct for TraceEvent and that's a bit too much for
/// such a simple structure.
std::unordered_map<lldb::TraceEvent, uint64_t> events_counts;
uint64_t total_count = 0;

void RecordEvent(lldb::TraceEvent event);
};

// Struct holding counts for errors
struct ErrorStats {
/// The following counters are mutually exclusive
/// \{
uint64_t other_errors = 0;
uint64_t fatal_errors = 0;
// libipt error -> count
llvm::DenseMap<const char *, uint64_t> libipt_errors;
/// \}

uint64_t GetTotalCount() const;

void RecordError(int libipt_error_code);

void RecordError(bool fatal);
};

DecodedThread(
lldb::ThreadSP thread_sp,
const llvm::Optional<LinuxPerfZeroTscConversion> &tsc_conversion);
Expand Down Expand Up @@ -194,12 +207,22 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
/// The load address of the instruction at the given index.
lldb::addr_t GetInstructionLoadAddress(uint64_t item_index) const;

/// \return
/// The number of instructions in this trace (not trace items).
uint64_t GetTotalInstructionCount() const;

/// Return an object with statistics of the trace events that happened.
///
/// \return
/// The stats object of all the events.
const EventsStats &GetEventsStats() const;

/// Return an object with statistics of the trace errors that happened.
///
/// \return
/// The stats object of all the events.
const ErrorStats &GetErrorStats() const;

/// The approximate size in bytes used by this instance,
/// including all the already decoded instructions.
size_t CalculateApproximateMemoryUsage() const;
Expand All @@ -221,7 +244,14 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
void AppendError(const IntelPTError &error);

/// Append a custom decoding.
void AppendCustomError(llvm::StringRef error);
///
/// \param[in] error
/// The error message.
///
/// \param[in] fatal
/// If \b true, then the whole decoded thread should be discarded because a
/// fatal anomaly has been found.
void AppendCustomError(llvm::StringRef error, bool fatal = false);

/// Append an event.
void AppendEvent(lldb::TraceEvent);
Expand Down Expand Up @@ -289,10 +319,16 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
/// TSC -> nanos conversion utility.
llvm::Optional<LinuxPerfZeroTscConversion> m_tsc_conversion;

/// Statistics of all tracing errors.
ErrorStats m_error_stats;

/// Statistics of all tracing events.
EventsStats m_events_stats;
/// Total amount of time spent decoding.
std::chrono::milliseconds m_total_decoding_time{0};

/// Total number of instructions in the trace.
uint64_t m_insn_count = 0;
};

using DecodedThreadSP = std::shared_ptr<DecodedThread>;
Expand Down
Loading

0 comments on commit c49d14a

Please sign in to comment.