Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deciding the Project's Future #423

Open
Archmonger opened this issue Jan 19, 2022 · 27 comments
Open

Deciding the Project's Future #423

Archmonger opened this issue Jan 19, 2022 · 27 comments

Comments

@Archmonger
Copy link
Contributor

Archmonger commented Jan 19, 2022

Summary

Right now we're at a bit of an impasse. It's noted in the original readme that django-dbbackup ...tries to use the traditional dump & restore mechanisms. In terms of the history of this project, it's possible that this used to be true. However, the current implementation appears to heavily rely on custom connectors in order to facilitate data dumps.

My proposal is implementing some breaking changes in order adhere to the original project description, and to limit breaking issues caused by django-dbbackup internals.

Suggested Path Forward

The suggestions below would convert django-dbbackup to more of an "upgraded" dumpdata/loaddata rather than a completely different kind of backup engine.

  1. Utilize Django's dumpdata and loaddata for doing the heavy-lifting in terms of serializing data
    • Using dumpdata with -o outputs a character stream to stdout that we can utilize.
  2. Pass-though all of Django's integrated dump/load features, such as multiple compression types and export formats
  3. Add encryption support on top of all this
  4. Add in "bonus features", such as...
    • Natively backing up to remote storage locations.
    • Post-processing scripts (probably an array within settings.py, similar to Django middleware)
    • Parallel execution of backup/restore on multiple databases by using subprocesses/threads
    • Backup/restore up all databases by default, but also allow for backing up specific databases
    • Convenient helper functions for supporting scheduled backups (via Celery/Huey)
    • Automatically delete old backups over the configured maximum amount of backups

Thoughts, Comments, and Remarks

I'm opening this up for anyone to voice their opinion on the project direction. This would be a breaking change, so if there's a general consensus that this isn't the ideal project direction then we can reassess.

@Archmonger Archmonger pinned this issue Jan 19, 2022
@johnthagen
Copy link
Contributor

@Archmonger I think this is a great path forward for django-dbbackup. I think the reality is that the project hasn't been maintained for a while, so if you have a vision and the time to execute in a manner that keeps the project healthy and lets us build upon existing libraries, it would be great for this project.

@Archmonger
Copy link
Contributor Author

I likely won't have time to develop this until somewhere around April, so until then this ticket will remain open for people to voice their opinions.

@stunaz
Copy link

stunaz commented Apr 4, 2022

New future user here: I'd like to use the project to backup/restaure my db/media. A completely new approach fully integrated with django itself feels ok to me

@benjaoming
Copy link
Contributor

Hi! I used to maintain this project years back - I think that the proposals sound great. I'm only hear to show some sign of life since I'm receiving emails from Read the Docs with a warning about the project being abandoned, which I think it isn't. No one has contacted me in this regards.

@johnthagen
Copy link
Contributor

@benjaoming Thank you for posting! Could you please add @Archmonger and myself to have permissions to push to RTD? We are the current maintainers and would like to get out a new stable release.

@Archmonger
Copy link
Contributor Author

Archmonger commented Apr 4, 2022

Hey @benjaoming sorry about that. I tried reaching out to jonathan-s and ZuluPro to gain access to the RTD. Neither has been responsive to providing access so I reached out to RTD themselves to assist.

As I found out this morning, RTD put in an abandonment check to see if they could give access to the docs for johnthagen and myself.

Let me know if you can add us as RTD maintainers, as it would be much appreciated.

@benjaoming
Copy link
Contributor

@johnthagen @Archmonger - absolutely! I'll just need your RTD usernames to do that 👍

@benjaoming
Copy link
Contributor

Aha! It seems that this is already in order, there was already a jonathan-s added as maintainer? And I've added Archmonger supposing it's you?

@Archmonger
Copy link
Contributor Author

Thanks! I've confirmed I've been added as a maintainer. I'll add in the Jazzband bot and johnthagen as soon as I get a chance.

@benjaoming
Copy link
Contributor

Wishing you the best with this project, thanks for being in Jazzband 💯

@banagale
Copy link

@benjaoming Thank you for posting and your work to help this project move along.

@Archmonger I like the ideas you've presented and wanted to voice my encouragement to act with boldness and not be too weighed down by breaking changes.

@pkkid
Copy link
Contributor

pkkid commented Apr 11, 2022

I made it here because I was notified I no longer am a collaborator on the pypi project I created! No big deal, I don't think I had any significant contribution to this for many years now. It's been amazing to see the project grow beyond anything I imagined, and owe a huge thanks to @benjaoming and @ZuluPro for taking over when I had moved on. I'm personally fine with any direction the current maintainer wants to take this package, since I don't really consider myself a maintainer anymore, my voice shouldn't carry much weight.

To clarify, the line "tries to use the traditional dump & restore mechanisms" was originally meant to mean that we use pgdump for Postgres and mysqldump for MySQL etc, rather than Django's loaddata and dumpdata. The reason being is the db specific tools are generally more tailored to work with the database files better, especially when those databases reach much larger sizes. This project was originally created because I needed an easy solution to backup database files that were several hundred GB in size and Django's serializer was not up for the task at the time. Admittedly, I do not know if Django made improvements here or not. But I have to imagine, using pgdump is still much more superior to Django's dumpdata (and likwise for other databases and their own tools).

Again, I'll state that I am totally fine with any direction, but I suspect we both we may have interpreted the phrase "traditional dump & restore" to mean opposite things.

@johnthagen
Copy link
Contributor

@pkkid Thank you for sharing this valuable historical context!

@Archmonger
Copy link
Contributor Author

Aplogies @pkkid!

I'm trying to get all active maintainers funneled through the Jazzband org.

Within this GitHub org, PyPi & RTD access is really only needed for emergencies, everything else is handled by the Jazzband-Bot. To limit potential security vulnerabilities (ex. hacked PyPI accounts), I'm trying to keep that list short.

If you'd like to maintain control over the project I can put you in as a project lead. Just let me know!

Also, thanks for the context and clarification! Dumpdata is pretty solid for my use cases, but admittedly I haven't tried it on giant datasets. I'll take a stab at a side by side comparison and compare performance.

@pkkid
Copy link
Contributor

pkkid commented Apr 11, 2022

Ha, no need to apologize. The project is in great hands, and I appreciate you and @johnthagen taking reigns to keep this project alive. Thank you!

@isedwards
Copy link

I've just seen and read this pinned issue after raising #468 yesterday (where I suggested a generic Python backup package that returned to using "traditional dump & restore mechanisms").

Is there a place for such a package at JazzBand? This would be a fork of the current project under a different name that went in a different direction.

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 21, 2022

Jazzband typically only hosts Django related packages, so I would doubt it.

Technically, it is fully possible to move the current connectors out of Django-DBBackup and have them be standalone.

However, if we implement the changes I suggested in this issue we would be firmly tied to Django and unable to separate any of that functionality.

@benjaoming
Copy link
Contributor

@isedwards I don't think you can start a new project in Jazzband based on a concept. See especially the section Viability here: https://jazzband.co/about/guidelines#viability

@Archmonger
Copy link
Contributor Author

Archmonger commented Apr 24, 2023

@johnthagen I'm thinking of spinning this repo out of Jazzband. I've been hesitant to make major changes or test/CI changes due to how slowly things are moving in Jazzband, which has spiraled into practically nothing getting done.

What's your thoughts on this, and would you assist in maintaining the package under either a new or old org?

@johnthagen
Copy link
Contributor

I think spinning it out would be a fine idea. I'd help out with basic maintenance under a new org.

@johnthagen
Copy link
Contributor

It would be nice to transfer the repo so we keep the stars.

@banagale
Copy link

It is fascinating that in this case jazzband as an organization seems to have the same challenges that independent OS projects do.

@Archmonger
Copy link
Contributor Author

Jazzband is currently a centralized org with only one admin. So naturally, if that one admin becomes busy then things don't move forward.

@WillNilges
Copy link

👋🏻 Hey folks, wondering if any progress has been made towards using django's dumpdata/loaddata with this plugin?

@pkkid
Copy link
Contributor

pkkid commented Apr 1, 2024

May I suggest if we want this to work with dumpdata/loaddata, that we create a new database type instead of replace what we have? As mentioned above, the original intent of the project was specifically to make it easy to use the pgdump and their variants. Dumpdata is a more generic solution developed by Django but also comes with downsides for larger projects. However, I think it might slot nicely into a new db type (maybe generically at db/django.py). Keep the old and support Django's way of doing things being the user's choice.

@Archmonger
Copy link
Contributor Author

That's a really good idea. I agree that it's best represented as an optional DB type.

I don't know when I'll have time to develop this, I've been stretched pretty thin lately.

@ZuluPro
Copy link
Contributor

ZuluPro commented Apr 1, 2024

@WillNilges
I agree with both @pkkid and @Archmonger
dumpdata/loaddata could be just an additionnal DB types in dbbackup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants