Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb/DWARF] Search fallback to the manual index in GetFullyQualified… #102123

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

labath
Copy link
Collaborator

@labath labath commented Aug 6, 2024

…Type

This is needed to ensure we find a type if its definition is in a CU that wasn't indexed. This can happen if the definition is in some precompiled code (e.g. the c++ standard library) which was built with different flags than the rest of the binary.

…Type

This is needed to ensure we find a type if its definition is in a CU
that wasn't indexed. This can happen if the definition is in some
precompiled code (e.g. the c++ standard library) which was built with
different flags than the rest of the binary.
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 6, 2024

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

…Type

This is needed to ensure we find a type if its definition is in a CU that wasn't indexed. This can happen if the definition is in some precompiled code (e.g. the c++ standard library) which was built with different flags than the rest of the binary.


Full diff: https://github.com/llvm/llvm-project/pull/102123.diff

2 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp (+1)
  • (added) lldb/test/Shell/SymbolFile/DWARF/x86/mixed-debug-names-complete-type-search.test (+35)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
index 7e66b3dccf97f..32d8a92305aaf 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
@@ -371,6 +371,7 @@ void DebugNamesDWARFIndex::GetFullyQualifiedType(
         !ProcessEntry(entry, callback))
       return;
   }
+  m_fallback.GetFullyQualifiedType(context, callback);
 }
 
 bool DebugNamesDWARFIndex::SameParentChain(
diff --git a/lldb/test/Shell/SymbolFile/DWARF/x86/mixed-debug-names-complete-type-search.test b/lldb/test/Shell/SymbolFile/DWARF/x86/mixed-debug-names-complete-type-search.test
new file mode 100644
index 0000000000000..71da8fad00165
--- /dev/null
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/mixed-debug-names-complete-type-search.test
@@ -0,0 +1,35 @@
+REQUIRES: lld, python
+
+RUN: split-file %s %t
+RUN: %clang --target=x86_64-pc-linux -g -gpubnames -c %t/file1.c -o %t-1.o
+RUN: %clang --target=x86_64-pc-linux -g -gno-pubnames -c %t/file2.c -o %t-2.o
+RUN: llvm-dwarfdump %t-1.o --debug-names | FileCheck %s --check-prefix=PUBNAMES
+RUN: llvm-dwarfdump %t-2.o --debug-names | FileCheck %s --check-prefix=NOPUBNAMES
+RUN: ld.lld %t-1.o %t-2.o -o %t.out
+RUN: %lldb %t.out -s %t/commands -o exit | FileCheck %s
+
+// Precondition check: The first file should contain a debug_names index, but no
+// entries for MYSTRUCT.
+PUBNAMES: Name Index @ 0x0 {
+PUBNAMES-NOT: MYSTRUCT
+
+// The second file should not contain an index.
+NOPUBNAMES-NOT: Name Index
+
+// Starting from the variable in the first file, we should be able to find the
+// declaration of the type in the first unit, and then match that with the
+// definition in the second unit.
+CHECK:      (lldb) script
+CHECK:      struct MYSTRUCT {
+CHECK-NEXT:   int x;
+CHECK-NEXT: }
+
+#--- commands
+script lldb.target.FindFirstGlobalVariable("struct_ptr").GetType().GetPointeeType()
+#--- file1.c
+struct MYSTRUCT *struct_ptr;
+#--- file2.c
+struct MYSTRUCT {
+  int x;
+};
+struct MYSTRUCT struct_;

@clayborg
Copy link
Collaborator

clayborg commented Aug 9, 2024

LGTM.

@labath labath merged commit 21ef272 into llvm:main Aug 12, 2024
9 checks passed
@felipepiovezan
Copy link
Contributor

Isn't this going to degrade the performance of all negative queries?
I don't remember off the top of my head what is "m_fallback" when we have an accelerator table, but this worries me.

@labath
Copy link
Collaborator Author

labath commented Aug 12, 2024

Sort of yes (*), but only so much as it is necessary to make it correct, and the level of slowdown depends on how much non-indexed code you have. The fallback index will only index those units that aren't covered by the debug_names index, so if debug_names covers everything, the fallback is a noop -- if there's one non-indexed unit, you only pay the cost of indexing that unit.

(*) except not really. The most expensive fallback operation is building the manual index (afterwards, it may even be faster than debug_names), and this happens first time that anything queries it. Since all of the other debug_names methods are already calling the fallback index, it will most likely be already built by the time we get to this point and the fallback call should be fast.

@felipepiovezan
Copy link
Contributor

The fallback index will only index those units that aren't covered by the debug_names index, so if debug_names covers everything, the fallback is a noop

Got it, this is what I was hoping that would happen! Thanks for explaining it.

(future ideas) This makes me wonder if a better design would have been to make whoever owns this instances of Index classes have instead a collection of these indices, instead of having an implementation of an index have to remember to call a fallback mechanism.
To answer this question with another question: since Index classes are not allowed to have false negatives / positives, we would never need more than one index (for a give CU). Which begs the question of why we have a design that allows us to reach the problem this PR is fixing? It feels off that we have two objects: a manual index and a dwarf index, and that the dwarf index has to remember to call the manual index when the dwarf index is empty.

labath added a commit that referenced this pull request Aug 12, 2024
…ualified… (#102123)"

The test appears to be flaky. Revert it while I investigate.

This reverts commits 7027cc6 and
21ef272.
labath added a commit that referenced this pull request Aug 12, 2024
…Qualified… (#102123)"

This reverts commit 38b67c5.

I reverted the wrong patch -- sorry :(
@labath labath deleted the mixed branch August 12, 2024 16:47
@labath
Copy link
Collaborator Author

labath commented Aug 12, 2024

The current setup makes sense to me, but I guess that's expected as I'm the one who created it. I can also imagine something like what you propose, but it doesn't seem like a clear win to me. These objects are owned by SymbolFileDWARF, and we probably don't want to have it do the work of juggling these (it has a lot on its plate already), so it would probably have to be a separate object (basically another implementation of the "dwarf index" interface). That would be a lot of boilerplate (though maybe we could use some template trickery to reduce it). We'd also need to come up with a less ad-hoc way communicate which things are supposed to be indexed by who, but we also wouldn't want to make it too generic (== more code), since there are basically only three index configurations we care about (and these could easily be reduced to two).

@felipepiovezan
Copy link
Contributor

The current setup makes sense to me, but I guess that's expected as I'm the one who created it. I can also imagine something like what you propose, but it doesn't seem like a clear win to me. These objects are owned by SymbolFileDWARF, and we probably don't want to have it do the work of juggling these (it has a lot on its plate already), so it would probably have to be a separate object (basically another implementation of the "dwarf index" interface). That would be a lot of boilerplate (though maybe we could use some template trickery to reduce it). We'd also need to come up with a less ad-hoc way communicate which things are supposed to be indexed by who, but we also wouldn't want to make it too generic (== more code), since there are basically only three index configurations we care about (and these could easily be reduced to two).

I guess what I was proposing was more about deleting code than adding it. We must always have at most one index, but we have a lot of code allowing for the situation of two indices (manual & {apple, dwarf}).

@clayborg
Copy link
Collaborator

The main issue that this covers is when you have N .o files compiled with .debug_names, and M .o files compiled without it, and then you link N+M into a final executable. The linker will simply concatenate all of the .o files section together, but the linker doesn't know anything about DWARF, so any of the M files that didn't have indexes, won't have .debug_names indexes. So the fallback stuff is needed here. The fallback doesn't re-index existing files, it just manually indexes any files that didn't have an accelerator table. So we must have this code because of the way things are compiled and linked on linux.

@labath
Copy link
Collaborator Author

labath commented Aug 13, 2024

I guess what I was proposing was more about deleting code than adding it. We must always have at most one index, but we have a lot of code allowing for the situation of two indices (manual & {apple, dwarf}).

Okay. I think I understand now what you meant -- which was to have each compile unit to store some sort of a reference to "its" index (is that right)? If that's the case, then I don't think that could work. While it's true that every CU is covered by exactly one index, a single index can (and usually does) cover more than one CU, and there isn't a good way to partition it along CU boundaries. The apple indexes such shove everything into a single table, which is indexed by name (a hash of it). There's no way to efficiently query only for entries belonging to a single unit. With per-CU debug_names indexes, we could do that, but we also have multi-cu indexes, where we'd have the same problem. (For manual indexes, we could do what we want, but we'd need to be careful to not regress performance, as we currently get a large speed boost from the paralelization of the indexing step).

On top of that, this isn't really a good match for the way indexes are used. When querying the index, we usually don't know which CU we want to look at (if we did, we may not even need the index). We just have a name, and we want the CU (and the DIE within that CU) which defines that name. Even if we created something like this, it would mean that each query would basically become a loop over all CUs to see which one contains the thing we're looking for.

bwendling pushed a commit to bwendling/llvm-project that referenced this pull request Aug 15, 2024
llvm#102123)

…Type

This is needed to ensure we find a type if its definition is in a CU
that wasn't indexed. This can happen if the definition is in some
precompiled code (e.g. the c++ standard library) which was built with
different flags than the rest of the binary.
bwendling pushed a commit to bwendling/llvm-project that referenced this pull request Aug 15, 2024
…ualified… (llvm#102123)"

The test appears to be flaky. Revert it while I investigate.

This reverts commits 7027cc6 and
21ef272.
bwendling pushed a commit to bwendling/llvm-project that referenced this pull request Aug 15, 2024
…Qualified… (llvm#102123)"

This reverts commit 38b67c5.

I reverted the wrong patch -- sorry :(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants