-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imputation functionality #21
Comments
Squashed commit of the following: commit d7794fd Merge: 03c3589 2e94403 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 15:10:33 2023 -0500 Merge branch 'master' of ssh://sciome-bot/stat/prestogp commit 03c3589 Merge: d2a2e3a 961880a Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 15:08:24 2023 -0500 Merge branch 'to-git' of ssh://sciome-bot/stat/prestogp commit 2e94403 Merge: 710520c 8dac095 Author: Shail Choksi <shail.choksi@sciome.com> Date: Thu Dec 28 15:07:40 2023 -0500 Pull request #20: R CMD check fixes Merge in STAT/prestogp from build-workflow to master * commit '8dac09511058405d630121ca13c589781c932bf4': R CMD check fixes Add new files to Collate section in DESCRIPTION file commit 8dac095 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 15:06:38 2023 -0500 R CMD check fixes commit f037a0b Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 14:22:52 2023 -0500 Add new files to Collate section in DESCRIPTION file commit 710520c Merge: d2a2e3a 11771e4 Author: Shail Choksi <shail.choksi@sciome.com> Date: Thu Dec 28 14:09:40 2023 -0500 Pull request #18: Added missing imports and ran auto-formatter in vscode for R and C++ Merge in STAT/prestogp from build-workflow to master * commit '11771e4ccbd6311aa35334c5ab4b7e4299a8db56': Added missing imports Ran auto-formatter/linter for R and C++ in vscode. Added some missing imports commit 11771e4 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 14:07:28 2023 -0500 Added missing imports commit 73aed75 Merge: 7c0bbfe d2a2e3a Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:58:37 2023 -0500 Merge branch 'master' of ssh://sciome-bot/stat/prestogp into build-workflow commit d2a2e3a Merge: bea8382 85517e7 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:57:14 2023 -0500 Merge branch 'master' of ssh://sciome-bot/stat/prestogp commit bea8382 Merge: 17f5284 1e4361d Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:55:26 2023 -0500 Merge branch 'main-sciome' of sciome-bot-git:Spatiotemporal-Exposures-and-Toxicology/PrestoGP commit 7c0bbfe Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:52:27 2023 -0500 Ran auto-formatter/linter for R and C++ in vscode. Added some missing imports commit 85517e7 Merge: 17f5284 2c9c2ed Author: Shail Choksi <shail.choksi@sciome.com> Date: Thu Dec 28 13:12:00 2023 -0500 Pull request #16: Add UBSAN/ASAN sanitizers Merge in STAT/prestogp from build-workflow to master * commit '2c9c2ede3d40e6e5af24b38a6acfc2dfb2994975': Add USAN/ASAN pipeline Pull request #14: Additional Testing Additional Testing Add UBSAN/ASAN sanitizers commit 2c9c2ed Merge: 6af43f4 56cdbf2 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:11:14 2023 -0500 Merge branch 'to-git' of ssh://sciome-bot/stat/prestogp into build-workflow commit 6af43f4 Merge: e530819 17f5284 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:07:50 2023 -0500 Merge branch 'master' of ssh://sciome-bot/stat/prestogp into build-workflow commit e530819 Author: sciome-bot <software.tools@sciome.com> Date: Thu Dec 28 13:06:21 2023 -0500 Add USAN/ASAN pipeline commit 17f5284 Merge: ea78ffa 895100d Author: Eric Bair <eric.bair@sciome.com> Date: Fri Dec 1 10:49:09 2023 -0500 Pull request #12: Testing Merge in STAT/prestogp from testing to master * commit '895100db78d89c5082ddfcf1411dad69bdf1b6c5': Fixed another sparseNN bug and updated tests Fixed a bug in SparseNN Added more likelihood and maxmin ordering tests commit 895100d Merge: 7f3ca3c fb060ff Author: Eric Bair <eric.bair@sciome.com> Date: Wed Nov 29 17:02:10 2023 -0500 Merge branch 'to-git' of http://192.168.167.103:7990/bitbucket/scm/stat/prestogp into testing Conflicts: DESCRIPTION man/prestogp_fit-PrestoGPModel-method.Rd tests/testthat/test-Log_Likelihood.R Fixed merge conflicts. commit 7f3ca3c Author: Eric Bair <eric.bair@sciome.com> Date: Wed Nov 29 16:43:40 2023 -0500 Fixed another sparseNN bug and updated tests commit 1ca9586 Author: Eric Bair <eric.bair@sciome.com> Date: Tue Nov 28 18:40:36 2023 -0500 Fixed a bug in SparseNN commit 80a9818 Author: Eric Bair <eric.bair@sciome.com> Date: Tue Nov 21 17:49:19 2023 -0500 Added more likelihood and maxmin ordering tests ... and 8 more commits
@brian-bk22 @ericbair-sciome The overall functionality was added with PR #56 . For the pesticide work, we need to allow for variable LOD. Currently it appears that each outcome can only have 1 LOD: Also, we don't have to implement this now if it is not trivial, but I think a more standard or straight forward approach |
I had been meaning to ask you about that. I thought you had said something about that in one of our meetings, but I wasn't sure. I'll go ahead and change this. It should be an easy fix. By the way, I have finished my more detailed testing of the imputation algorithm. It seems to work very well for MAR missingness, but there is some bias in the LOD case. (And it gets steadily worse as the proportion of missing data increases.) I'm hoping to send a new version to Shail tonight. (Everything is done other than documentation at this point plus the aforementioned change, which should be easy.) My plan was to fix the show/accessor methods after that (since that is important for model interpretation) and then double back to see if we can figure out a way to improve the LOD imputation. |
@ericbair-sciome Thanks for the quick reply and fix. That is good to hear it will be an easy fix. For the LOD imputation effectiveness, also good to hear it is working for random case. It is expected that it will get worst as the proportion of missingness increases. One thing I mentioned in an email was the idea of multiple imputation. I'm developing the pesticide analysis through the As an example, here is the current working version of the visualization of the targets pipeline. I have pre-processing, exploratory analysis, testing on a vanilla glmnet model, sub-sample dataset for testing PrestoGP, which is where it is currently failing: |
@ericbair-sciome Also, question related to the |
No, there is one scaling input for all outcomes for basically the reason you just said. :) |
As an FYI, in the latest version, each outcome can have a separate LOD. Let me know if the syntax is unclear and I will try to improve the documentation. I'm going to keep this issue open for now. While imputation is implemented in the current release, it seems to be biased when the percentage of missing data due to LOD is very high. We are working on some alternative approaches that seem to work better. If all goes well, we should have an improved imputation algorithm implemented in the next few weeks. |
This took much longer than I hoped, but the imputation functionality is significantly improved in the new version. There are still issues that need to be worked out. In particular, I think the new version is going to be prohibitively slow when the number of missing values is high. But I think I will close this issue and create some new issues for the new unresolved problems. |
@ericbair-sciome Thanks and no worries. I'm also taking longer on getting the pesticide data processed for analysis. I'll check out the new version ASAP. One question in regards to computational time. Does the new imputation method allow the user to control the number of iterations or the tolerance in re-estimating |
At the moment, the number of iterations and tolerance are hard coded. I should probably give users the option of changing that. I'll put that on the list to do. |
@ericbair-sciome Great - thank you. Whatever it is set to right now can be an easy default, but control would be good if someone wants to sacrifice some precision for speed. A |
Good suggestions. I'll put that on the list. |
Functions to impute data (due to limit of detection or otherwise) need to be added to the package.
The text was updated successfully, but these errors were encountered: