Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regards to the image distribution #1

Open
yuunam97 opened this issue Feb 19, 2022 · 1 comment
Open

Regards to the image distribution #1

yuunam97 opened this issue Feb 19, 2022 · 1 comment

Comments

@yuunam97
Copy link

Hey there! I came across your repository due to my interest in age estimation using PyTorch.

If you look at the Kaggle Dataset as given in the link, it gives you Training Datasets (12611 images) and Test Datasets (200 images), but no Validation Dataset. Does this mean that for your Validation dataset, you created a separate folder/dataset for validation?

Also under your test, valid and train dataset split:

k = 100
size = 500
image_filenames = glob.glob(train_dataset_path+'*.png')
random_images = random.sample(population = image_filenames,k = k)


means = []
stds = []

for filename in random_images:
    image = cv2.imread(filename,0)
    image = cv2.resize(image,(size,size))
    mean,std = cv2.meanStdDev(image)
#    mean /= 255
#    std /= 255
    
    means.append(mean[0][0])
    stds.append(std[0][0])

avg_mean = np.mean(means) 
avg_std = np.mean(stds)

print('Approx. Mean of Images in Dataset: ',avg_mean)
print('Approx. Standard Deviation of Images in Dataset: ',avg_std)

# To reproduce results use below values
#avg_mean = 52.96
#avg_std = 26.19
#%%

# Split Train Validation Test
# Train - 10000 images
# Val   -  1261 images
# Test  -  1261 images

dataset_size = len(image_filenames)
val_size = dataset_size + 1261

bones_df = pd.read_csv(csv_path)
bones_df.iloc[:,1:3] = bones_df.iloc[:,1:3].astype(np.float64)

train_df = bones_df.iloc[:dataset_size,:]
val_df = bones_df.iloc[dataset_size:val_size,:]
test_df = bones_df.iloc[val_size:,:]

age_max = np.max(bones_df['boneage'])
age_min = np.min(bones_df['boneage'])

#%%
class BonesDataset(Dataset):
    def __init__(self, dataframe, image_dir, transform=None):

        self.dataframe = dataframe

        
        self.image_dir = image_dir
        self.transform = transform
        

    def __len__(self):
        return self.dataframe.shape[0]

    def __getitem__(self, idx):
        img_name = self.image_dir + str(self.dataframe.iloc[idx,0]) + '.png'
        image = cv2.imread(img_name,0)
        image = image.astype(np.float64)
        gender = np.atleast_1d(self.dataframe.iloc[idx,2])
        bone_age = np.atleast_1d(self.dataframe.iloc[idx,1])
        
        sample = {'image': image, 'gender': gender, 'bone_age':bone_age}

        if self.transform:
            sample = self.transform(sample)

        return sample

#%% 
# Custom Transforms for Image and numerical data
# Resize and Convert numpy array to tensor
class ToTensor(object):
    

    def __call__(self, sample):
        image, gender, bone_age = sample['image'], sample['gender'], sample['bone_age']

        image = cv2.resize(image,(size,size))
        image = np.expand_dims(image,axis = 0)
        
#       we need to convert  cuda.longtensors to cuda.floatTensor data type
        return {'image': torch.from_numpy(image).float(),
                'gender': torch.from_numpy(gender).float(),
                'bone_age':torch.from_numpy(bone_age).float()}        

# Normalize images and bone age
class Normalize(object):
    
    def __init__(self,img_mean,img_std,age_min,age_max):
        self.img_mean = img_mean
        self.img_std = img_std
        
        self.age_min = age_min
        self.age_max = age_max
        
    def __call__(self,sample):
        image, gender, bone_age = sample['image'], sample['gender'], sample['bone_age']
        
        image -= self.img_mean
        image /= self.img_std
        
        bone_age = (bone_age - self.age_min)/ (self.age_max - self.age_min)
        
        return {'image': image,
                'gender': gender,
                'bone_age':bone_age} 
        
data_transform = transforms.Compose([
   Normalize(avg_mean,avg_std,age_min,age_max),
   ToTensor()
   
   ])     
    
#%%
train_dataset = BonesDataset(dataframe = train_df,image_dir=train_dataset_path,transform = data_transform)
val_dataset = BonesDataset(dataframe = val_df,image_dir = train_dataset_path,transform = data_transform)
test_dataset = BonesDataset(dataframe = test_df,image_dir=test_dataset_path,transform = data_transform)

# Sanity Check
print(train_dataset[199])

     
train_data_loader = DataLoader(train_dataset,batch_size=4,shuffle=False,num_workers = 4)
val_data_loader = DataLoader(val_dataset,batch_size=4,shuffle=False,num_workers = 4)
test_data_loader = DataLoader(test_dataset,batch_size=4,shuffle=False,num_workers = 4)


# Sanity Check 2
sample_batch =  next(iter(train_data_loader))
print(sample_batch)

I ran the code on the #Sanity Check 2, however I got line 175, in <module> sample_batch = next(iter(test_data_loader)) StopIteration. Could you please explain how I could resolve this?

I am a noob and new at this deep learning aspect, so please understand :)

@karelbecerra
Copy link

Hi @yuunam97 did you find a way to make it work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants