Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 1291: Implement storage of ClamAV scan results #1321

Merged
merged 16 commits into from
Sep 24, 2021

Conversation

jtwillis92
Copy link

@jtwillis92 jtwillis92 commented Sep 14, 2021

Summary of Changes

Closes issue #1291

Adds the capability for the tdrs-backend application to store all results from files scanned by ClamAV and adds an associated LogEntry that is viewable in the Django admin.

This is done through the use of a new model ClamAVFileScan which represents an individual file scan performed by ClamAV. When the ClamAVClient.scan_file method is called it will now create an instance of this model along with a LogEntry instance that is tied to the ClamAVFileScan instance. In the event the scan is successful and a DataFile is produced, it will also be linked to this ClamAVFileScan so that the scanned file can be retrieved as necessary for auditing purposes. In the event the scan is rejected, we will not store the file itself so the data_file link will be empty for these records.

Additionally, adds a new Django app called security to house all of the ClamAV related code and the new model. This app is intended to be used for any security auditing requirements (such as the upcoming #1161)

How to Test

cd tdrs-frontend && docker-compose -f docker-compose.yml -f docker-compose.local.yml up -d
cd tdrs-backend && docker-compose -f docker-compose.yml -f docker-compose.local.yml up -d 

This change is only visible in Django Admin to OFA Admin and OFA System Admin users. For the purposes of testing this, it is simplest to just work through as an OFA Admin.

  1. Open http://localhost:3000/ and sign in as an OFA Admin user.
  2. Navigate to the Data Files tab and submit a valid file which you wouldn't expect to be rejected by ClamAV
  3. Still on Data Files, refresh the page and then attempt to submit an infected file
  4. Navigate to http://localhost:8080/admin/ and click in to the Log Entries list
  5. Filter results by the content type security | Clam AV File Scans:
    Screenshot from 2021-09-14 14-57-28
  6. Ensure that a CLEAN and INFECTED result exist for the files submitted.
  7. Click on the object link to view each of the ClamAVFileScan instances:
    log-entries-clamav-file-scans
  8. Ensure the INFECTED file has no attached DataFile instance and has all relevant details:
    Screenshot from 2021-09-14 15-00-41
  9. Ensure the CLEAN file has a linked DataFile instance and all relevant details:
    Screenshot from 2021-09-14 15-01-26

Log Entries can be filtered by month if more than one month is present in the logs:
log-entries-by-month

Clam AV File Scans can also be viewed altogether in their own admin list:
Screenshot from 2021-09-14 15-46-30

Deliverable 1: Accepted Features

Performance Standard(s): At the beginning of each sprint, the Product Owner and development team will collaborate to define a set of user stories to be completed during the sprint. Acceptance criteria for each story will also be defined. The development team will deliver code and functionality to satisfy these user stories.

Acceptable Quality Level: Delivered code meets the acceptance criteria for each user story. Incomplete stories will be assessed and considered for inclusion in the next sprint.

  • Look up the acceptance criteria in the related issue; paste ACs below in checklist format.
  • Check against the criteria:
    • Logs from Clam AV scan results during Data File uploads.
    • Link log entry to file object that will allow us to reference the file and user (links through intermediary model ClamAVFileScan)
    • Will use custom model filters in DAC to find CLAM AV-only records and can then further filter by month
    • We can store these in TDP Cloud.gov S3 and have admins users to access them via a link in Django admin.

As Product Owner, @lfrohlich will decide if ACs are met.

Deliverable 2: Tested Code

Performance Standard(s): Code delivered under the order must have substantial test code coverage. Version-controlled HHS GitHub repository of code that comprises products that will remain in the government domain.

Acceptable Quality Level: Minimum of 90% test coverage of all code. All areas of code are meaningfully tested.

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?

Deliverable 3: Properly Styled Code

Performance Standard(s): GSA 18F Front- End Guide

Acceptable Quality Level: 0 linting errors and 0 warnings

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Does this PR change any linting or CI settings?

Deliverable 4: Accessible

Performance Standard(s): Web Content Accessibility Guidelines 2.1 AA standards

Acceptable Quality Level: 0 errors reported using an automated scanner and 0 errors reported in manual testing

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with @iamjolly and @ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

n/a

Deliverable 5: Deployed

Performance Standard(s): Code must successfully build and deploy into the staging environment.

Acceptable Quality Level: Successful deployment by assigning a 'Deploy with CircleCI - {env}' label

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

Performance Standard(s): Summary of user stories completed every two weeks. All dependencies are listed and the licenses are documented. Major functionality in the software/source code is documented, including system diagram. Individual methods are documented inline in a format that permits the use of tools such as JSDoc. All non-inherited 800-53 system security controls are documented in the Open Control or OSCAL format and HHS Section 508 Product Assessment Template (PAT) are updated as appropriate.

Acceptable Quality Level: Combination of manual review and automated testing, if available

  • If this PR introduces backend code, is that code documented both inline and overall?
  • If this PR introduces frontend code, is that code documented both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?

Deliverable 7: Secure

Performance Standard(s): Open Web Application Security Project (OWASP) Application Security Verification Standard 3.0

Acceptable Quality Level: Code submitted must be free of medium- and high-level static and dynamic security vulnerabilities

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any security issues?
    no new issues detected

Deliverable 8: Context

  • Performance Standard(s): Code must be understandable and contextualized for the reviewers possess the knowledge and background necessary for analysis and constructive criticism to take place.
  • Acceptable Quality Level: Code submitted in the pull request has context.*
  • Does this pull request contain sufficient inline comments providing relevant context for code snippets?
  • Is the code understandable and lucid?
  • Does this pull request provide background for why coding decisions were made?
  • Can you as a reviewer explain and take ownership of these elements presented in this code review?

reviewed by @ADPennington

@jtwillis92 jtwillis92 added backend security dev compliance OCIO-related compliance tasks labels Sep 14, 2021
write_only=True,
validators=[validate_data_file]
)
file = serializers.FileField(write_only=True)
Copy link
Author

@jtwillis92 jtwillis92 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation was moved to the serializers validate_file method instead so we could access serializer context.

data_file = DataFile.create_new_version(validated_data)

# Determine the matching ClamAVFileScan for this DataFile.
av_scan = ClamAVFileScan.objects.filter(
Copy link
Author

@jtwillis92 jtwillis92 Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataFile doesn't get created until after the scan is complete and recorded so we have to retroactively link these records if the creation was successful.

"""Perform all validation steps on a given file."""
file, user = attrs['file'], attrs['user']
validate_file_extension(file.name)
validate_file_infection(file, user)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic previously existed in the validators validate_data_file method.

"""Validate file is not infected by scanning with ClamAV."""
av_client = ClamAVClient()
try:
is_file_clean = av_client.scan_file(file, file.name)
is_file_clean = av_client.scan_file(file, uploaded_by)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to track the user that uploaded the file or we won't have anything to link the LogEntry to.


# Log and create audit records with the results of this scan
logger.debug(msg)
ClamAVFileScan.objects.record_scan(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution flow was altered here so we only return at the end, after creating the ClamAVFileScan instance. This allows us to track all results from ClamAV - not just the CLEAN files.

except ConnectionError as err:
logger.debug(f'Encountered error scanning file: {err}')
logger.error(f'ClamAV connection failure: {err}')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case we can't connect to ClamAV we will not store a result for the scan.

class ClamAVFileScanManager(models.Manager):
"""Extends object manager functionality with common operations."""

def record_scan(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a helper function to easily create a ClamAVFileScan instance at the same time as a linked LogEntry.

)

data_file = models.ForeignKey(
DataFile,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is nullable, since we don't store INFECTED or ERROR files we will only have this link if the file scan was CLEAN.

from django.db import migrations

from tdpservice.users.permissions import (
add_permissions_q,
create_perms,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled this logic out so it can be re-used as new models are added.


operations = [
# We need to call this again to ensure permissions are created for
# the newly added ClamAVFileScan model.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the comment above


def set_clamav_file_scan_permissions(apps, schema_editor):
"""Set permissions for the ClamAVFileScan model for relevant groups."""
ofa_admin = apps.get_model('auth', 'Group').objects.get(name='OFA Admin')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only OFA Admin and OFA System Admin will be able to see these results. No one may edit them other than the superuser.

@jtwillis92 jtwillis92 added the raft review This issue is ready for raft review label Sep 14, 2021

def validate_file(self, file):
"""Perform all validation steps on a given file."""
user = self.context.get('user')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context needs to be passed in by the implementing view(s) in the get_serializer_context method.

"""If a serializer has valid data it will return a valid object."""
create_serializer = DataFileSerializer(data=data_file_data)
create_serializer = DataFileSerializer(
context={'user': user},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to manually pass this context in tests as it is used in validation steps.

def get_serializer_context(self):
"""Retrieve additional context required by serializer."""
context = super().get_serializer_context()
context['user'] = self.request.user
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what gives the validate_file method on the serializer access to the user.

@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #1321 (92cf09e) into raft-tdp-main (81e2150) will increase coverage by 1.01%.
The diff coverage is 99.15%.

Impacted file tree graph

@@                Coverage Diff                @@
##           raft-tdp-main    #1321      +/-   ##
=================================================
+ Coverage          98.40%   99.42%   +1.01%     
=================================================
  Files                 38       41       +3     
  Lines                939     1036      +97     
  Branches              47       56       +9     
=================================================
+ Hits                 924     1030     +106     
+ Misses                10        2       -8     
+ Partials               5        4       -1     
Flag Coverage Δ
dev-backend 99.42% <99.15%> (+1.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tdrs-backend/tdpservice/core/admin.py 100.00% <ø> (ø)
tdrs-backend/tdpservice/security/models.py 98.33% <98.33%> (ø)
tdrs-backend/tdpservice/data_files/serializers.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/data_files/validators.py 100.00% <100.00%> (+15.38%) ⬆️
tdrs-backend/tdpservice/data_files/views.py 98.18% <100.00%> (+0.14%) ⬆️
tdrs-backend/tdpservice/security/admin.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/security/apps.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/security/clients.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/settings/common.py 100.00% <100.00%> (ø)
tdrs-backend/tdpservice/users/permissions.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81e2150...92cf09e. Read the comment docs.

@andrew-jameson
Copy link
Collaborator

Ok, got some time to review here more deeply and looks good. Will approve/move to QASP once you have been able to resolve that 500 error we're seeing once deployed.

@jtwillis92
Copy link
Author

Ok, got some time to review here more deeply and looks good. Will approve/move to QASP once you have been able to resolve that 500 error we're seeing once deployed.

I'm not sure why it was missing, but that error was due to a missing AV_SCAN_URL environment variable. I added that back and everything worked as expected in sandbox.

@andrew-jameson andrew-jameson added QASP Review and removed raft review This issue is ready for raft review labels Sep 21, 2021
@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-sandbox labels Sep 21, 2021
@ADPennington ADPennington added Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI and removed Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Sep 24, 2021
Copy link
Collaborator

@ADPennington ADPennington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great @jtwillis92! reference re: our async discussion about frontend alert message for other unsuccessful submissions is captured in #831 here. as discussed, we'll also need to handle that scenario where multiple files are submitted, with at least one infected and one clean.

ticked off qasp checklist in main PR summary, so @abottoms-coder ready to merge!

@ADPennington ADPennington added Ready to Merge and removed QASP Review Deploy with CircleCI-qasp Deploy to https://tdp-frontend-qasp.app.cloud.gov through CircleCI labels Sep 24, 2021
Co-authored-by: Alex P.  <63075587+ADPennington@users.noreply.github.com>
@andrew-jameson andrew-jameson merged commit 49c7736 into raft-tdp-main Sep 24, 2021
@andrew-jameson andrew-jameson deleted the backend/1291-clam-av-scan-logs branch September 24, 2021 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants