-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLV Distribution RVs not Model-Specific #128
Comments
We should add those variants to generate data. Do you want to assign this issue to yourself? |
I consider this a prerequisite for #127, so I'll get started on adding a ParetoNBD distribution sometime next week. The existing distributions should also be revised to reflect being written with the BG/NBD model in mind. I can also look into vectorization for |
I'll be creating a PR for this soon. @larryshamalama I see you made the original commit for the |
Sorry, this message slipped by my attention. I did not use any research articles since I did not find any when I was writing out the likelihood... Perhaps there is one out there that I'm unaware of. I can write out the likelihood derivation again if it would help |
@larryshamalama Let's refactor
|
Sounds like a good plan, thanks for laying a bullet point style action plan. I'm away until early March, we can chat once I'll be back to work. |
Hi @ColtAllen, I am just getting back to work and browsing the current progress that has being made. My understanding is that your focus is, for now, #177 and #176. I can start with #98 and we can see from there. How does that sound?
Edit: Re-reading your original comment in opening this issue, I see where you are coming from. I'm still wondering if there's a better way in generalizing model building blocks and making them robust for all (otherwise most/many) model types. |
Sounds great 👍
My main interest in model-specific distribution blocks is for use within the model like I'm doing in #177, unlocking additional functionality. That said, it could be interesting to test how well the ParetoNBD model converges on data generated from a BG/NBD process, and vice-versa. If there isn't interest in adding an individual-level BG/NBD model, we don't have a means of generating raw transaction data yet, so that could be a better way to repurpose that particular distribution block. |
Shall we modify the building blocks to be specific to CLV models? E.g. IIRC, we opted against this because all we needed was the |
@larryshamalama let's rework https://github.com/ColtAllen/btyd/blob/main/btyd/generate_data.py#L75 The reason I suggest this is because if you recall our last weekly project meeting, @twiecki wants all https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb And in time, the notebook itself added to the docs. The first thing we need is a raw transaction block to generate the synthetic data. We should create issues for the other |
I'm considering using the RVs in the CLV Distributions module to generate synthetic data for testing the Pareto/NBD model. However, after looking at the
rng_fn
for both classes, I'm concerned the RVs may not be robust across all model types, and the distribution classes could have similar pathologies.As currently defined, the
sim_data
method inrng_fn
is using a binomial RV within a while loop for the dropout probability. This works well for the Modified BG/NBD model, but I do not see a provision for the BG/NBD assumption that all non-repeat customers are alive with probability 1. The Pareto/NBD also does not use a binomial RV at all - instead it uses an exponential RV to predict the dropout time period prior to the while loop.The data generation functions in Lifetimes/BTYD are a useful reference:
https://github.com/ColtAllen/btyd/blob/main/btyd/generate_data.py
The text was updated successfully, but these errors were encountered: