Logs_AD.py - Finding the Most Different String in Log Files

This Python script, Logs_AD.py, is designed to find the most different string in a logs file using the embeddings generated by the SentenceTransformer model. It's particularly useful when working with large datasets where GPU acceleration can significantly speed up computations.

Features

GPU Availability Check: Checks if a GPU is available on your system. If a GPU is available, it will display the number of GPUs and their respective names. If a GPU is not available, it will default to using the CPU.
Model Loading: Loads a SentenceTransformer model. SentenceTransformer is a Python framework for state-of-the-art sentence, text and image embeddings. The model used here is 'all-MiniLM-L6-v2'.
Text Embedding: Defines a function embeddings(text) to create embeddings for a given text. This function is applied to the 'SQL' column of a DataFrame df, creating a new column 'SQL_Embedded' with the embeddings.
Storing Embeddings: Stores the embeddings to disk in a pickle file named 'embeddings_test.pkl'. This is done so that the embeddings don't have to be recomputed every time.
Loading Embeddings: Provides commented-out code for loading the embeddings from the pickle file if needed.
Calculating Average Embedding: Calculates the average value of the embeddings using multithreading for efficiency. This is particularly useful if the dataset is large. The average embedding is then stored in the variable embedding_avg.

Dependencies

PyTorch: This script uses the PyTorch library to interact with the GPU.
SentenceTransformer: This script uses the SentenceTransformer library to generate text embeddings.
concurrent.futures: This script uses the concurrent.futures library for efficient multithreading.
numpy: This script uses the numpy library for mathematical operations.
pickle: This script uses the pickle library to store and load embeddings.

Usage

To use this script, simply run it in your Python environment. The script will automatically check for GPU availability, load the SentenceTransformer model, create text embeddings, store the embeddings, calculate the average embedding, and print the results.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
tests		tests
LICENSE		LICENSE
Logs_AD.py		Logs_AD.py
README.md		README.md
SQL_random.xlsx		SQL_random.xlsx
embeddings_test.pkl		embeddings_test.pkl
results.xlsx		results.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logs_AD.py - Finding the Most Different String in Log Files

Features

Dependencies

Usage

Contributing

About

Releases

Packages

Languages

License

lordpba/Unsupervised_Logs_Anomaly_Detection

Folders and files

Latest commit

History

Repository files navigation

Logs_AD.py - Finding the Most Different String in Log Files

Features

Dependencies

Usage

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages