Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Tips for Debugging C Extensions #35100

Merged
merged 16 commits into from
Dec 10, 2020
Merged
93 changes: 93 additions & 0 deletions doc/source/development/debugging_extensions.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. _debugging_c_extensions:

{{ header }}

======================
Debugging C extensions
======================

Pandas uses select C extensions for high performance IO operations. In case you need to debug segfaults or general issues with those extensions, the following steps may be helpful.

First, be sure to compile the extensions with the appropriate flags to generate debug symbols and remove optimizations. This can be achieved as follows:

.. code-block:: sh

python setup.py build_ext --inplace -j4 --with-debugging-symbols

Using a debugger
================

Assuming you are on a Unix-like operating system, you can use either lldb or gdb to debug. The choice between either is largely dependent on your compilation toolchain - typically you would use lldb if using clang and gdb if using gcc. For macOS users, please note that ``gcc`` is on modern systems an alias for ``clang``, so if using Xcode you usually opt for lldb. Regardless of which debugger you choose, please refer to your operating systems instructions on how to install.

After installing a debugger you can create a script that hits the extension module you are looking to debug. For demonstration purposes, let's assume you have a script called ``debug_testing.py`` with the following contents:

.. code-block:: python

import pandas as pd

pd.DataFrame([[1, 2]]).to_json()

Place the ``debug_testing.py`` script in the project root and launch a Python process under your debugger. If using lldb:

.. code-block:: sh

lldb python

If using gdb:

.. code-block:: sh

gdb python

Before executing our script, let's set a breakpoint in our JSON serializer in its entry function called ``objToJSON``. The lldb syntax would look as follows:

.. code-block:: sh

breakpoint set --name objToJSON

Similarly for gdb:

.. code-block:: sh

break objToJSON

.. note::

You may get a warning that this breakpoint cannot be resolved in lldb. gdb may give a similar warning and prompt you to make the breakpoint on a future library load, which you should say yes to. This should only happen on the very first invocation as the module you wish to debug has not yet been loaded into memory.

Now go ahead and execute your script:

.. code-block:: sh

run <the_script>.py
WillAyd marked this conversation as resolved.
Show resolved Hide resolved

Code execution will halt at the breakpoint defined or at the occurance of any segfault. LLDB's `GDB to LLDB command map <https://lldb.llvm.org/use/map.html>`_ provides a listing of debugger command that you can execute using either debugger.

Another option to execute the entire test suite under lldb would be to run the following:

.. code-block:: sh

lldb -- python -m pytest

Or for gdb

.. code-block:: sh

gdb --args python -m pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ gdb --args python3 -m pytest
[...]
"~/.pyenv/shims/python3": not in executable format: File format not recognized
(gdb) run
Starting program:  -m pytest
No executable file specified.
Use the "file" or "exec-file" command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try gdb -ex r --args python3 -m pytest? Taking that from this link:

https://wiki.python.org/moin/DebuggingWithGdb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea looks like gdb -ex r --args python3 -m pytest pandas/tests was working for me if you want to try on yours and confirm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ gdb -ex r --args python3 -m pytest pandas/tests
[...]
"~/.pyenv/shims/python3": not in executable format: File format not recognized
Starting program:  -m pytest pandas/tests
No executable file specified.
Use the "file" or "exec-file" command.
(gdb) 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using pyenv for development or Conda?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've already gone above and beyond helping me debug this; ill spend some more time on this and ping you if i find anything new

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good - I think this is generally helpful to hash out together so thanks for the input.

It looks like this might be specific to pyenv and how it manages the python executable:

https://stackoverflow.com/questions/48141135/cannot-start-dbg-on-my-python-c-extension

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdb -ex r --args bash pytest pandas/tests --skip-slow --skip-db tentatively looks like a winner

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the command from my previous comment specific to my case, or relevant to the document?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid adding too much detail here since this issue is more of a pyenv thing than a debugger issue


Once the process launches, simply type ``run`` and the test suite will begin, stopping at any segmentation fault that may occur.

Checking memory leaks with valgrind
===================================

You can use `Valgrind <https://www.valgrind.org>`_ to check for and log memory leaks in extensions. For instance, to check for a memory leak in a test from the suite you can run:

.. code-block:: sh

PYTHONMALLOC=malloc valgrind --leak-check=yes --track-origins=yes --log-file=valgrind-log.txt python -m pytest <path_to_a_test>

Note that code execution under valgrind will take much longer than usual. While you can run valgrind against extensions compiled with any optimization level, it is suggested to have optimizations turned off from compiled extensions to reduce the amount of false positives. The ``--with-debugging-symbols`` flag passed during package setup will do this for you automatically.

.. note::

For best results, you should run use a Python installation configured with Valgrind support (--with-valgrind)
1 change: 1 addition & 0 deletions doc/source/development/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Development
maintaining
internals
test_writing
debugging_extensions
extending
developer
policies
Expand Down
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,8 @@ def run(self):
extra_compile_args.append("-Werror")
if debugging_symbols_requested:
extra_compile_args.append("-g")
extra_compile_args.append("-UNDEBUG")
extra_compile_args.append("-O0")

# Build for at least macOS 10.9 when compiling on a 10.9 system or above,
# overriding CPython distuitls behaviour which is to target the version that
Expand Down