Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

while embedding_concat() runs, I got "Killed" #32

Open
wonchul-kim opened this issue Aug 19, 2021 · 5 comments
Open

while embedding_concat() runs, I got "Killed" #32

wonchul-kim opened this issue Aug 19, 2021 · 5 comments

Comments

@wonchul-kim
Copy link

I think I got too much number of data....?

what do you think?

@haobo827
Copy link

I think I got too much number of data....?

what do you think?

Try cropsize=224

@michaelstuffer98
Copy link

If it is still relevant, this might be if you run out of RAM. Happened for me when having large datasets.

@RichardChangCA
Copy link

If I use this model in a larger dataset, out of RAM is a big issue. Can anybody solve it? I guess this is the natural shortcoming of this method.

@michaelstuffer98
Copy link

@RichardChangCA what I did was repeatedly running the embedding_concat() after a fixed amount of samples have been loaded, such that you can perform the index-selection during loading the data which reduces memory usage. Wrote some sample function that you have to call after e.g. every 1000 samples have been processed. Define that inside the main such that it can access the idx tensor and the embeddings=[] list that you have to declare yourself in the main function. Reinitialize empty the train_outputs/test_outputs dict after running store embeddings to free the memory.

def store_embeddings(model_outputs):
            for k, v in model_outputs.items():
                model_outputs[k] = torch.cat(v, 0)
            embedding_vectors = model_outputs[0]
            for layer_name in layers[1:]:
                    embedding_vectors = embedding_concat(embedding_vectors, model_outputs[layer_name])
            embedding_vectors = torch.index_select(embedding_vectors, 1, idx)
            embeddings.append(embedding_vectors)

@RichardChangCA
Copy link

@RichardChangCA what I did was repeatedly running the embedding_concat() after a fixed amount of samples have been loaded, such that you can perform the index-selection during loading the data which reduces memory usage. Wrote some sample function that you have to call after e.g. every 1000 samples have been processed. Define that inside the main such that it can access the idx tensor and the embeddings=[] list that you have to declare yourself in the main function. Reinitialize empty the train_outputs/test_outputs dict after running store embeddings to free the memory.

def store_embeddings(model_outputs):
            for k, v in model_outputs.items():
                model_outputs[k] = torch.cat(v, 0)
            embedding_vectors = model_outputs[0]
            for layer_name in layers[1:]:
                    embedding_vectors = embedding_concat(embedding_vectors, model_outputs[layer_name])
            embedding_vectors = torch.index_select(embedding_vectors, 1, idx)
            embeddings.append(embedding_vectors)

Thanks @michaelstuffer98 Do you know how to calculate the covariance matrix if the dataset is too large? I optimized the cpu usage for other parts, only left the covariance matrix calculation. I have to store all embeddings for all normal training data to calculate the covariance matrix, but my cpu cannot stand it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants