Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hbond pmap_mpi output format #1644

Open
Llauset opened this issue Dec 5, 2023 · 7 comments
Open

hbond pmap_mpi output format #1644

Llauset opened this issue Dec 5, 2023 · 7 comments

Comments

@Llauset
Copy link

Llauset commented Dec 5, 2023

Hello,

Version: 2.0.5

Using pt.hbond and pt.distance work really well for my case study and I would like to use the parallel version pmap_mpi to accelerate de calculation.

The parallelization of pt.distance works straight forward. I'm struggling with the parallelization of pt.hbond.
The outputting format changes with respect to the sequential version.
Sequential: <pytraj.hbonds.DatasetHBond donor_acceptor pairs : 6938>
Parallel: [(OrderedDict([('total_solute_hbonds', array([706, 681, 670, ..., 692, 702, 680], dtype=int32)),
'PHE1472_O-LEU1475_N-H', array([1, 0, 0, ..., 1, 0, 0], dtype=int32)) ....

Trying to obtain the same format I added dtype as follows:
hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135, dtype="hbond")
And in that case I have the following error:
File "//.conda/envs/openmpi_test_3.7/lib/python3.7/site-packages/pytraj/parallel/base.py", line 51, in concat_hbond
all_keys.update(partial_data[0].keys())
AttributeError: 'DatasetHBond' object has no attribute 'keys'

I tried to work around the problem directly returning the object data_collection and accessing to it data_collection[0][0].get_amber_mask()[0] but doing that I have not all the hbonds.

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Thank you in advance for your help.

@hainm
Copy link
Contributor

hainm commented Dec 6, 2023

Can you please tell me if there is a way to change the parallel outputting format to the sequential one or if exist a parameter to obtain directly the same format in the parallel version than in the sequential.

Dear @Llauset, unfortunately there is no way to do any thing you mentioned. But we will keep this in mind, I think it's nice to make it work.

For the information: what kind of information you want from pytraj.hbonds.DatasetHBond?

@Llauset
Copy link
Author

Llauset commented Dec 7, 2023

Dear Hainm,

Thanks for your responsiveness and your answer.
I want to retrieve the list of hydrogen bonds defined by a distance and an angle calculated with pt.hbond and obtained using the amber_mask() function of the pytraj.hbonds.DatasetHBond object. As the trajectory is long I would like to compute ph.hbond in parallel.

If the development of the parallel functionality takes time I would like to try to develop a work around.
Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)?
How should one proceed? If this is a functionality that may interest you, once implemented, we could share it with you.

@hainm
Copy link
Contributor

hainm commented Dec 7, 2023

I want to retrieve the list of hydrogen bonds

Dear @Llauset: for the parallel version, the returning data is a dict where the keys are total_solute_hbonds and all the hbonds formed during the simulation.

here is an example

In [1]: import pytraj as pt
tra
In [2]: traj = pt.datafiles.load_trpcage()[:]

In [3]: 

In [3]: d = pt.hbond(traj, dtype='dict')

In [4]: d.keys()
Out[4]: odict_keys(['total_solute_hbonds', 'ASN1_O-GLN5_N-H', 'ARG16_O-TRP6_NE1-HE1', 'TYR3_O-LEU7_N-H', 'ILE4_O-LYS8_N-H', 'LEU7_O-GLY10_N-H', 'ASP9_O-SER14_OG-HG', 'SER14_O-ARG16_N-H', 'ASP9_OD2-ARG16_NH1-HH12', 'ASP9_OD2-ARG16_NH2-HH22', 'LEU2_O-TRP6_N-H', 'GLN5_OE1-LYS8_NZ-HZ1', 'ASN1_O-ILE4_N-H', 'TRP6_O-GLY11_N-H', 'SER20_OXT-SER20_OG-HG', 'ASN1_O-TYR3_N-H', 'GLY11_O-SER14_OG-HG', 'ASP9_OD2-ARG16_NE-HE', 'ASN1_OD1-LEU2_N-H', 'ASP9_OD1-LYS8_NZ-HZ1', 'ASP9_OD2-ARG16_NH2-HH21', 'SER20_O-SER20_OG-HG', 'GLY10_O-SER13_N-H', 'GLY10_O-SER13_OG-HG', 'ASP9_OD1-SER14_OG-HG', 'PRO12_O-GLY15_N-H', 'PRO19_O-SER20_OG-HG', 'GLY11_O-SER14_N-H', 'SER13_O-SER13_OG-HG', 'GLN5_O-ASP9_N-H', 'ASP9_OD2-SER14_OG-HG', 'ASP9_OD2-ARG16_NH1-HH11'])

In [5]: d['ASP9_OD2-ARG16_NE-HE']
Out[5]: array([0, 0, 0, ..., 0, 0, 0], dtype=int32)

d['ASP9_OD2-ARG16_NE-HE'] return an array of int with either 0 or 1 value representing the absence or existence of that spefic hbond for specific frame.

Please let me know if that works for you.

@hainm
Copy link
Contributor

hainm commented Dec 7, 2023

Is it possible to rebuild this list 'easily' from the object returned by hb_parallelized = pt.pmap_mpi(pt.hbond, traj, distance=3.0, angle=135)?

So the question is "yes, it's easy"
(d comes from example above)

print(list(set(d) - {"total_solute_hbonds"}))

@hainm
Copy link
Contributor

hainm commented Dec 7, 2023

If this is a functionality that may interest you, once implemented, we could share it with you.

Yes, any contribution to the code is always welcome. Thanks.

@Llauset
Copy link
Author

Llauset commented Dec 8, 2023

Thank you for your help.

What I did is this function to transform the output of the parallel hbond to the amber_mask and this solve my problem :

def from_hbond_parallel_to_amber_mask(hb_parallelized):
    """
    Convert the keys of hb_parallelized dictionary to amber mask
    :param hb_parallelized: dictionary with the keys of the hydrogen bonds
    :return: list of tuples with the amber mask of the keys
    :rtype: list
    """
    # get all the keys from hb_parallelized dictionary 
    keys = list(hb_parallelized.keys())
    # remove the key 'total_solute_hbonds'
    keys.remove('total_solute_hbonds')
    # change format of keys from HIE4_O-LYS8_NZ-HZ2 to HIE_4@O-LYS_8@NZ-HZ2
    for i in range(len(keys)):
        keys[i] = keys[i].replace("_", " ").replace("-", " ").split()
        # slip the first element after 3 characters
        keys[i][0] = keys[i][0][:3] + '_' + keys[i][0][3:]
        keys[i][2] = keys[i][2][:3] + '_' + keys[i][2][3:]
        acceptor_mask = '@'.join((keys[i][0], keys[i][1]))
        donor_mask = '@'.join((keys[i][2], keys[i][3]))
        keys[i] = '-'.join((acceptor_mask, donor_mask, keys[i][4]))
    # Use function to_amber_mask to convert the keys to amber mask
    amber_masks = list(pt.hbond_analysis.to_amber_mask(keys))
    # split the list of tuples to two independent lists
    distance_masks, angle_masks = list(zip(*amber_masks))
    return distance_masks, angle_masks

@hainm
Copy link
Contributor

hainm commented Dec 8, 2023

thanks @Llauset for the code. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants