hbond pmap_mpi output format #1644

Llauset · 2023-12-05T11:00:20Z

Hello,

Version: 2.0.5

Using pt.hbond and pt.distance work really well for my case study and I would like to use the parallel version pmap_mpi to accelerate de calculation.

The parallelization of pt.distance works straight forward. I'm struggling with the parallelization of pt.hbond.
The outputting format changes with respect to the sequential version.
Sequential: <pytraj.hbonds.DatasetHBond donor_acceptor pairs : 6938>
Parallel: [(OrderedDict([('total_solute_hbonds', array([706, 681, 670, ..., 692, 702, 680], dtype=int32)),
'PHE1472_O-LEU1475_N-H', array([1, 0, 0, ..., 1, 0, 0], dtype=int32)) ....

Trying to obtain the same format I added dtype as follows:
hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135, dtype="hbond")
And in that case I have the following error:
File "//.conda/envs/openmpi_test_3.7/lib/python3.7/site-packages/pytraj/parallel/base.py", line 51, in concat_hbond
all_keys.update(partial_data[0].keys())
AttributeError: 'DatasetHBond' object has no attribute 'keys'

I tried to work around the problem directly returning the object data_collection and accessing to it data_collection[0][0].get_amber_mask()[0] but doing that I have not all the hbonds.

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Thank you in advance for your help.

hainm · 2023-12-06T02:13:19Z

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Dear @Llauset, unfortunately there is no way to do any thing you mentioned. But we will keep this in mind, I think it's nice to make it work.

For the information: what kind of information you want from pytraj.hbonds.DatasetHBond?

Llauset · 2023-12-07T10:17:21Z

Dear Hainm,

Thanks for your responsiveness and your answer.
I want to retrieve the list of hydrogen bonds defined by a distance and an angle calculated with pt.hbond and obtained using the amber_mask() function of the pytraj.hbonds.DatasetHBond object. As the trajectory is long I would like to compute ph.hbond in parallel.

If the development of the parallel functionality takes time I would like to try to develop a work around.
Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)?
How should one proceed? If this is a functionality that may interest you, once implemented, we could share it with you.

hainm · 2023-12-07T14:36:03Z

I want to retrieve the list of hydrogen bonds

Dear @Llauset: for the parallel version, the returning data is a dict where the keys are total_solute_hbonds and all the hbonds formed during the simulation.

here is an example

In [1]: import pytraj as pt
tra
In [2]: traj = pt.datafiles.load_trpcage()[:]

In [3]: 

In [3]: d = pt.hbond(traj, dtype='dict')

In [4]: d.keys()
Out[4]: odict_keys(['total_solute_hbonds', 'ASN1_O-GLN5_N-H', 'ARG16_O-TRP6_NE1-HE1', 'TYR3_O-LEU7_N-H', 'ILE4_O-LYS8_N-H', 'LEU7_O-GLY10_N-H', 'ASP9_O-SER14_OG-HG', 'SER14_O-ARG16_N-H', 'ASP9_OD2-ARG16_NH1-HH12', 'ASP9_OD2-ARG16_NH2-HH22', 'LEU2_O-TRP6_N-H', 'GLN5_OE1-LYS8_NZ-HZ1', 'ASN1_O-ILE4_N-H', 'TRP6_O-GLY11_N-H', 'SER20_OXT-SER20_OG-HG', 'ASN1_O-TYR3_N-H', 'GLY11_O-SER14_OG-HG', 'ASP9_OD2-ARG16_NE-HE', 'ASN1_OD1-LEU2_N-H', 'ASP9_OD1-LYS8_NZ-HZ1', 'ASP9_OD2-ARG16_NH2-HH21', 'SER20_O-SER20_OG-HG', 'GLY10_O-SER13_N-H', 'GLY10_O-SER13_OG-HG', 'ASP9_OD1-SER14_OG-HG', 'PRO12_O-GLY15_N-H', 'PRO19_O-SER20_OG-HG', 'GLY11_O-SER14_N-H', 'SER13_O-SER13_OG-HG', 'GLN5_O-ASP9_N-H', 'ASP9_OD2-SER14_OG-HG', 'ASP9_OD2-ARG16_NH1-HH11'])

In [5]: d['ASP9_OD2-ARG16_NE-HE']
Out[5]: array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

d['ASP9_OD2-ARG16_NE-HE'] return an array of int with either 0 or 1 value representing the absence or existence of that spefic hbond for specific frame.

Please let me know if that works for you.

hainm · 2023-12-07T14:38:00Z

Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)?

So the question is "yes, it's easy"
(d comes from example above)

print(list(set(d) - {"total_solute_hbonds"}))

hainm · 2023-12-07T14:38:49Z

If this is a functionality that may interest you, once implemented, we could share it with you.

Yes, any contribution to the code is always welcome. Thanks.

Llauset · 2023-12-08T17:25:46Z

Thank you for your help.

What I did is this function to transform the output of the parallel hbond to the amber_mask and this solve my problem :

def from_hbond_parallel_to_amber_mask(hb_parallelized):
    """
    Convert the keys of hb_parallelized dictionary to amber mask
    :param hb_parallelized: dictionary with the keys of the hydrogen bonds
    :return: list of tuples with the amber mask of the keys
    :rtype: list
    """
    # get all the keys from hb_parallelized dictionary 
    keys = list(hb_parallelized.keys())
    # remove the key 'total_solute_hbonds'
    keys.remove('total_solute_hbonds')
    # change format of keys from HIE4_O-LYS8_NZ-HZ2 to HIE_4@O-LYS_8@NZ-HZ2
    for i in range(len(keys)):
        keys[i] = keys[i].replace("_", " ").replace("-", " ").split()
        # slip the first element after 3 characters
        keys[i][0] = keys[i][0][:3] + '_' + keys[i][0][3:]
        keys[i][2] = keys[i][2][:3] + '_' + keys[i][2][3:]
        acceptor_mask = '@'.join((keys[i][0], keys[i][1]))
        donor_mask = '@'.join((keys[i][2], keys[i][3]))
        keys[i] = '-'.join((acceptor_mask, donor_mask, keys[i][4]))
    # Use function to_amber_mask to convert the keys to amber mask
    amber_masks = list(pt.hbond_analysis.to_amber_mask(keys))
    # split the list of tuples to two independent lists
    distance_masks, angle_masks = list(zip(*amber_masks))
    return distance_masks, angle_masks

hainm · 2023-12-08T17:43:15Z

thanks @Llauset for the code. Cheers.

hainm added the improvement label Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hbond pmap_mpi output format #1644

hbond pmap_mpi output format #1644

Llauset commented Dec 5, 2023

hainm commented Dec 6, 2023

Llauset commented Dec 7, 2023

hainm commented Dec 7, 2023

hainm commented Dec 7, 2023

hainm commented Dec 7, 2023

Llauset commented Dec 8, 2023 •

edited

Loading

hainm commented Dec 8, 2023

hbond pmap_mpi output format #1644

hbond pmap_mpi output format #1644

Comments

Llauset commented Dec 5, 2023

hainm commented Dec 6, 2023

Llauset commented Dec 7, 2023

hainm commented Dec 7, 2023

hainm commented Dec 7, 2023

hainm commented Dec 7, 2023

Llauset commented Dec 8, 2023 • edited Loading

hainm commented Dec 8, 2023

Llauset commented Dec 8, 2023 •

edited

Loading