Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tuning questions and the dataset splits method? #123

Open
HongdaChen opened this issue Jul 3, 2023 · 0 comments
Open

Fine-tuning questions and the dataset splits method? #123

HongdaChen opened this issue Jul 3, 2023 · 0 comments

Comments

@HongdaChen
Copy link

HongdaChen commented Jul 3, 2023

Scripts

With the help of chatGPT, the following script can output the intersection of several *.txt files:

def read_file(filename):
    with open(filename, 'r') as file:
        lines = file.readlines()
        numbers = [line.strip() for line in lines]
        result = set(numbers)
        print(f"{filename} has {len(result)} images")
        return result

def find_intersection(files):
    if len(files) < 2:
        raise ValueError("At least two files are required for finding the intersection.")

    sets = [read_file(file) for file in files]
    intersection = set.intersection(*sets)
    print(f"intersection num of {files} is {len(intersection)}")
    return intersection

# Example usage
# files = ['t1_train.txt', 't2_train.txt', 't3_train.txt', 't4_train.txt']  # replace with the actual paths to your files
files = ['t2_ft.txt', 't3_ft.txt']
# files = ['t2_train.txt', 't2_ft.txt']
intersection = find_intersection(files)

Find the dataset split method under the hood

root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t1_train.txt has 16551 images
t2_train.txt has 45520 images
t3_train.txt has 39402 images
t4_train.txt has 40260 images
intersection num of ['t1_train.txt', 't2_train.txt', 't3_train.txt', 't4_train.txt'] is 0
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t1_train.txt has 16551 images
t2_train.txt has 45520 images
t2_ft.txt has 1743 images
intersection num of ['t1_train.txt', 't2_train.txt', 't2_ft.txt'] is 0
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t2_train.txt has 45520 images
t2_ft.txt has 1743 images
intersection num of ['t2_train.txt', 't2_ft.txt'] is 1330
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t1_train.txt has 16551 images
t2_ft.txt has 1743 images
intersection num of ['t1_train.txt', 't2_ft.txt'] is 413
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t2_train.txt has 45520 images
t3_ft.txt has 2361 images
intersection num of ['t2_train.txt', 't3_ft.txt'] is 1402
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t1_train.txt has 16551 images
t3_ft.txt has 2361 images
intersection num of ['t1_train.txt', 't3_ft.txt'] is 374
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t3_train.txt has 39402 images
t3_ft.txt has 2361 images
intersection num of ['t3_train.txt', 't3_ft.txt'] is 938
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# python find_intersection.py 
t2_ft.txt has 1743 images
t3_ft.txt has 2361 images
intersection num of ['t2_ft.txt', 't3_ft.txt'] is 107
root@46a2a355a17d:/owod_master/datasets/OWOD_imagesets# 
  • Why t2_ft.txt does not contain the $N_{ex} \times 20 = 1000$ from t1_train.txt, where $N_{ex}=50$ as you suggested in the paper.
  • Why t3_ft.txt contains less than {t1_train $\cap$ t3_ft plus t2_train $\cap$ t3_ft plus t3_train $\cap$ t3_ft} = 374+1402 + 938 = 2714, where t3_ft has 2361 images.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant