Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCM doesn't reflect remote status. #1769

Closed
daavoo opened this issue May 25, 2022 · 17 comments
Closed

SCM doesn't reflect remote status. #1769

daavoo opened this issue May 25, 2022 · 17 comments
Labels
A: trees Area: SCM and DVC-tracked trees blocked Issue or pull request blocked due to other dependencies or issues priority-p2 Future feature, less priority for now

Comments

@daavoo
Copy link
Contributor

daavoo commented May 25, 2022

The SCM section contains action for dvc push but it doesn't reflect the remote status, only workspace status.

I believe showing the remote status is relevant and the lack of it might cause confusion and incorrect workflows for users unfamiliar with DVC.

VSCode could use dvc status -c to query for remote status

Example workflow:


1. Start from nothing changed

2. I run an experiment.

Captura de Pantalla 2022-05-25 a las 17 22 29

I see changes to dvc-tracked files and git-tracked files.

3. Used to my regular workflow, I stage and commit the git-tracked files.

I get a clear signal by Git SCM that my workspace and remote are out of sync:

Captura de Pantalla 2022-05-25 a las 17 26 59

However, the DVC SCM is empty. As I am familiar with Git SCM, I interpret this as there is nothing else to do with dvc.

4. I click on the big sync changes button.


At this point, if someone clones the repo, it will receive a broken state (regarding DVC tracked files).

@daavoo daavoo added discussion 🎨 design Needs design input or is being actively worked on A: trees Area: SCM and DVC-tracked trees labels May 25, 2022
@daavoo
Copy link
Contributor Author

daavoo commented May 25, 2022

Side note about step 2.
Might be just me, but I find it confusing that the DVC section contains a title and actions/buttons that affect the Git section and don't have any effect or reflect any updates on the DVC section. Referring to:

Captura de Pantalla 2022-05-25 a las 17 33 37

@daavoo
Copy link
Contributor Author

daavoo commented May 25, 2022

Based on support duty frequency, this workflow pitfall is commonly encountered by (usually new) dvc users, without the extension.

My worry is that the extension doesn't prevent it, but rather contributes.

If dvc status -c is too expensive and optimization can't be in time for release, a "hotfix" could be to have some UI around the pre-push git-hook . For example, recommend dvc install at setup time or even install, at least the pre-push, by default (might be too aggressive?)

@daavoo daavoo removed the 🎨 design Needs design input or is being actively worked on label May 25, 2022
@mattseddon
Copy link
Member

Related to #922.

Unfortunately, we cannot current access the native button because it is only in the proposed API. I will look again and see if anything has changed since the last time I checked.

@mattseddon
Copy link
Member

I'll also check in with a few people in the VS Code community to see if there is anything that can be done to get the action button into the stable API.

@dberenbaum
Copy link
Contributor

If dvc status -c is too expensive

I don't think it makes sense to check the remote constantly, but doing it once after each git commit could make sense.

@shcheklein
Copy link
Member

@mattseddon is it possible to put a comment in the section? :) may be we can just explain this for now?

@alex000kim
Copy link

How about adding a refresh button that'll run dvc status -c on demand?

@shcheklein
Copy link
Member

Discussed this with the team. One idea to consider initially is to have a single notification / button / status from which people can understand that some files are missing in the remote storage. Initially it can be even outside the SCM panel. Can be in the status bar?

@alkatar21
Copy link

alkatar21 commented Nov 17, 2022

I don't use the sync button, but some indicator that the DVC remote is not in sync with the local status would be helpful.
Maybe also which files, but currently I can't think of a solution except to add a tab like Commits/Branches etc. where you can see the differences between local and remote maybe?

@mattseddon
Copy link
Member

Could we use data status --json --granular --unchanged --not-in-remote --no-remote-refresh for this?

For the initial implementation, we could let users trigger the remote refresh on demand but ideally, the information would be "auto-updated" whenever the CLI interacted with the remote. E.g. on exp push/push or pull (I could be doing something wrong but it doesn't seem like this is the case).

Is this something that we should start looking at @dberenbaum? We can talk about it tomorrow if that works for you.

@mattseddon
Copy link
Member

@efiop is the above behaviour expected for data status?

@efiop
Copy link

efiop commented Jul 5, 2023

@mattseddon Not quite sure I get how scm is related here, a bit confused about the whole issue.

@dberenbaum
Copy link
Contributor

@efiop Sorry, I asked @mattseddon to ping you. The question is here:

Could we use data status --json --granular --unchanged --not-in-remote --no-remote-refresh for this?

For the initial implementation, we could let users trigger the remote refresh on demand but ideally, the information would be "auto-updated" whenever the CLI interacted with the remote. E.g. on exp push/push or pull (I could be doing something wrong but it doesn't seem like this is the case).

Can that command (with --no-remote-refresh) still have remote info indexed from what's already been pushed/pulled by the user?

@efiop
Copy link

efiop commented Jul 5, 2023

@dberenbaum Thanks for clarifying! Totally, if dvc push/pull was invoked before.

@mattseddon
Copy link
Member

mattseddon commented Jul 5, 2023

This is the behaviour that I'm seeing with 3.4.0:

/demo @914b0f67 ❯ dvc data status --json --granular --unchanged --not-in-remote --no-remote-refresh
{                                                                                                                                                                          
  "not_in_remote": [
    "hist.csv",
    "model.pt",
    "training/plots/",
    "training/plots/images/misclassified.jpg",
    "training/plots/sklearn/confusion_matrix.json",
    "training/plots/metrics/train/loss.tsv",
    "training/plots/metrics/train/acc.tsv",
    "training/plots/metrics/test/acc.tsv",
    "training/plots/metrics/test/loss.tsv"
  ],
  "committed": {
    "modified": [
      "hist.csv",
      "model.pt",
      "training/plots/",
      "training/plots/images/misclassified.jpg",
      "training/plots/sklearn/confusion_matrix.json",
      "training/plots/metrics/train/loss.tsv",
      "training/plots/metrics/train/acc.tsv",
      "training/plots/metrics/test/acc.tsv",
      "training/plots/metrics/test/loss.tsv"
    ]
  },
  "unchanged": [
    "data/MNIST/raw/train-labels-idx1-ubyte.gz",
    "data/MNIST/raw/t10k-images-idx3-ubyte.gz",
    "data/MNIST/raw/t10k-labels-idx1-ubyte.gz",
    "data/MNIST/raw/t10k-images-idx3-ubyte",
    "data/MNIST/raw/train-images-idx3-ubyte.gz",
    "data/",
    "data/MNIST/raw/train-images-idx3-ubyte",
    "data/MNIST/raw/t10k-labels-idx1-ubyte",
    "data/MNIST/raw/train-labels-idx1-ubyte"
  ]
}
/demo @914b0f67 !4 ❯ dvc push
9 files pushed
/demo @914b0f67 !4 ❯ dvc data status --json --granular --unchanged --not-in-remote --no-remote-refresh
{                                                                                                                                                                          
  "not_in_remote": [
    "model.pt",
    "hist.csv",
    "training/plots/",
    "training/plots/sklearn/confusion_matrix.json",
    "training/plots/images/misclassified.jpg",
    "training/plots/metrics/train/loss.tsv",
    "training/plots/metrics/train/acc.tsv",
    "training/plots/metrics/test/loss.tsv",
    "training/plots/metrics/test/acc.tsv"
  ],
  "committed": {
    "modified": [
      "model.pt",
      "hist.csv",
      "training/plots/",
      "training/plots/sklearn/confusion_matrix.json",
      "training/plots/images/misclassified.jpg",
      "training/plots/metrics/train/loss.tsv",
      "training/plots/metrics/train/acc.tsv",
      "training/plots/metrics/test/loss.tsv",
      "training/plots/metrics/test/acc.tsv"
    ]
  },
  "unchanged": [
    "data/",
    "data/MNIST/raw/train-images-idx3-ubyte.gz",
    "data/MNIST/raw/t10k-labels-idx1-ubyte",
    "data/MNIST/raw/t10k-labels-idx1-ubyte.gz",
    "data/MNIST/raw/train-images-idx3-ubyte",
    "data/MNIST/raw/t10k-images-idx3-ubyte",
    "data/MNIST/raw/train-labels-idx1-ubyte",
    "data/MNIST/raw/train-labels-idx1-ubyte.gz",
    "data/MNIST/raw/t10k-images-idx3-ubyte.gz"
  ]
}

Only running dvc data status --json --granular --unchanged --not-in-remote triggers the update of the remote index.

@efiop
Copy link

efiop commented Jul 5, 2023

@mattseddon Yeah, that's expected. Remote index is currently only automatically updated by push/pull/fetch for cloud versioning. We are ready to do that for regular clouds in fetch(and pull), but push needs some work iterative/dvc#9333

@mattseddon
Copy link
Member

For the record, for this issue, we're trying to solve a targeted use case where a user has just run some experiment which has generated some new artefacts. These will initially show up at "not in remote" but after dvc push or dvc exp push origin <exp-name> it would be good to drop the data from the "not in remote" status. This would enable us to let users know that their remote is not currently up to date.

@mattseddon mattseddon added the blocked Issue or pull request blocked due to other dependencies or issues label Jul 6, 2023
@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: trees Area: SCM and DVC-tracked trees blocked Issue or pull request blocked due to other dependencies or issues priority-p2 Future feature, less priority for now
Projects
None yet
Development

No branches or pull requests

7 participants