Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCD model migration #240

Merged
merged 67 commits into from
Jun 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f3fe835
reset migrations
fgregg Nov 17, 2018
b337366
update django, start to OCD the models
fgregg Nov 17, 2018
ba7766c
council members coming into view
fgregg Nov 26, 2018
9c088f6
nearly pages working except about and search
fgregg Nov 30, 2018
63ba9e2
bring back old migrations
fgregg Dec 1, 2018
adb584e
search and about page
fgregg Dec 1, 2018
ff7c696
flake8, all pages working in Chicago
fgregg Dec 1, 2018
316f6f5
still using dateutil
fgregg Dec 1, 2018
2e06e07
signals
fgregg Apr 17, 2019
e27b618
Merge pull request #242 from datamade/ocd_signals
fgregg Apr 17, 2019
6fe938f
Merge branch 'ocd' of github.com:datamade/django-councilmatic into ocd
fgregg May 13, 2019
240d124
restore old migrations
fgregg May 13, 2019
fc9bd86
fix up on_delete in older migrations for django 2.0
fgregg May 13, 2019
e507f84
comment out proxy related fields
fgregg May 13, 2019
30503de
migrate the councilmatic_core tables
fgregg May 13, 2019
9896aea
try to get travis tests working
fgregg May 13, 2019
f3b624f
install flak8 for tests
fgregg May 13, 2019
d86dc1b
reduce changes
fgregg May 13, 2019
f23cd63
remove incremental haystack update
fgregg May 13, 2019
1de8113
flake8 coming
fgregg May 13, 2019
a6dd9a4
not using activity stream right now
fgregg May 13, 2019
49525d8
password_reset is part of django_notifications app:
fgregg May 13, 2019
0dd2577
__init__.py in signals folder
fgregg May 13, 2019
14a7c58
postgis on travis
fgregg May 13, 2019
1dff240
try relative path
fgregg May 13, 2019
ffce100
full path
fgregg May 13, 2019
223fb4b
maybe travis is missing the __init__
fgregg May 13, 2019
4c8e1bf
find_packages
fgregg May 13, 2019
daad426
Try overriding relation in manager, add back template in search indexes
hancush May 22, 2019
946b350
Return datetime from prepare_bill_action
hancush May 30, 2019
8db6c8c
Enable proxy relationships, update Membership manager to handle null …
hancush May 30, 2019
04c8499
Repair committees attribute
hancush May 30, 2019
825f30a
Download headshots to app static directory
hancush May 30, 2019
45add3b
Add prefix to headshot filenames for easy exclusion from gitignore
hancush May 30, 2019
f546b45
More person headshot fixes, handle sponsorships with no associated Pe…
hancush Jun 3, 2019
1d7c3b1
Fix object reference
hancush Jun 3, 2019
ad04071
Add link_html to Person, update console logging
hancush Jun 3, 2019
6d290a1
Consistent use of link_html
hancush Jun 3, 2019
1bc397c
Install django-proxy overrides from master
hancush Jun 10, 2019
43101d8
Get the tests passing
hancush Jun 10, 2019
a28d0c7
flakin' all over the world
hancush Jun 11, 2019
3747e55
Get the tests passing redux
hancush Jun 11, 2019
a598736
Upgrade pip
hancush Jun 11, 2019
2240208
Turn up test verbosity, flake8 pt. 2
hancush Jun 11, 2019
855bfdb
Uncomment cache directives
hancush Jun 11, 2019
0fdcf41
Subclass BillDocument, refactor and test convert_attachment_text
hancush Jun 11, 2019
16d4ce9
Refactor refresh_pic to use the ORM
hancush Jun 12, 2019
fb171a4
Stash inexplicably failing test of management command
hancush Jun 12, 2019
acf0052
Strip whitespace
hancush Jun 12, 2019
9849ea3
Make the fixtures independent of one another
hancush Jun 12, 2019
f4c75b0
Remove import_data
hancush Jun 12, 2019
c4ffeac
Refactor convert_rtf to use ORM
hancush Jun 12, 2019
f0735d5
take8
hancush Jun 12, 2019
00b08ef
Add use_template kwarg
hancush Jun 18, 2019
abab18f
Put the headshots in their own directory
hancush Jun 18, 2019
8c595a3
Collect static at the end of update_headshots
hancush Jun 18, 2019
d71478e
Update testing instructions
hancush Jun 18, 2019
99f681c
Format code blocks
hancush Jun 18, 2019
50141df
Yet more formatting
hancush Jun 18, 2019
fd66e3c
Add missing migration
hancush Jun 18, 2019
373a77b
OCD_CITY_COUNCIL_ID -> OCD_CITY_COUNCIL_NAME
hancush Jun 18, 2019
6c2a9bd
Update test settings
hancush Jun 18, 2019
8515994
Drop ambiguous last action date preparation
hancush Jun 19, 2019
4436df9
Move ordering into primary_sponsorship property, add comments
hancush Jun 19, 2019
74e7555
Override BillSponsorship.organization
hancush Jun 19, 2019
1cd01eb
Flake it up
hancush Jun 19, 2019
449ff74
Handle sponsorships of bills with no last action date
hancush Jun 19, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,7 @@ django-councilmatic.egg-info/*
.DS_STORE
build/*
.cache
.pytest_cache
.env
.pytest_cache
*#
*~
11 changes: 9 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@ env:

addons:
postgresql: '9.4'
apt:
packages:
- postgresql-9.4-postgis-2.3

before_script:
- psql -U postgres -c "create extension postgis"

install:
- pip install --upgrade -r tests/test_requirements.txt
- pip install --upgrade pip
- pip install -r tests/requirements.txt --upgrade
- pip install -e .

sudo: required
Expand All @@ -18,4 +25,4 @@ group: deprecated-2017Q4

script:
- flake8 ./councilmatic_core/*.py
- pytest
- pytest -sv
40 changes: 32 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,41 @@ Want to build your own Councilmatic? Check out our `Starter Template <https://gi

Running tests
----
Did you make changes to django-councilmatic? Before you make a pull request, run some tests. We test for style with `flake8 <http://flake8.pycqa.org/en/latest/>`_:
Did you make changes to django-councilmatic? Before you make a pull request, run some tests.

```bash
flake8 ./councilmatic_core/*.py
```
First, install the test requirements:

We test for functionality with a custom-made `TestCase`. Be sure to specify the owner of your psql databse in the export command:
.. code-block:: bash

```bash
export db_user='yourusername' && python runtests.py
```
pip install -r tests/requirements.txt

We test for style with `flake8 <http://flake8.pycqa.org/en/latest/>`_:

.. code-block:: bash

flake8 ./councilmatic_core/*.py

We test for functionality with `pytest`:

.. code-block:: bash

pytest

If you made material changes to the Councilmatic models, refresh the test fixture from a local instance database. From your instance directory (assuming you've already installed :code:`django-councilmatic` with :code:`pip install -e /path/to/django-councilmatic`), install the test requirements:

.. code-block:: bash

pip install -r /path/to/django-councilmatic/tests/test_requirements.txt

Add :code:`fixture_magic` to your instance's :code:`INSTALLED_APPS` in :code:`settings.py`.

Run the management command to update the test fixture.

.. code-block:: bash

python manage.py make_fixtures

Run the tests and commit your updated fixture with your PR!

Team
----
Expand Down
1 change: 1 addition & 0 deletions councilmatic_core/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
default_app_config = 'councilmatic_core.apps.CouncilmaticConfig'
9 changes: 9 additions & 0 deletions councilmatic_core/apps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from django.apps import AppConfig


class CouncilmaticConfig(AppConfig):
name = 'councilmatic_core'
verbose_name = "Councilmatic"

def ready(self):
import councilmatic_core.signals.handlers # noqa
8 changes: 4 additions & 4 deletions councilmatic_core/feeds.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from django.contrib.syndication.views import Feed
from django.utils.feedgenerator import Rss201rev2Feed
from django.core.urlresolvers import reverse, reverse_lazy
from django.urls import reverse, reverse_lazy
from django.conf import settings

from .models import Person, Bill, Organization, Event
Expand Down Expand Up @@ -120,7 +120,7 @@ def description(self, obj):
return "Recent sponsored bills from " + obj.name + "."

def items(self, person):
sponsored_bills = [s.bill for s in person.primary_sponsorships.order_by('-_bill__last_action_date')[:10]]
sponsored_bills = [s.bill for s in person.primary_sponsorships][:10]
recent_sponsored_bills = sponsored_bills[:self.NUM_RECENT_BILLS]
return recent_sponsored_bills

Expand Down Expand Up @@ -187,7 +187,7 @@ def item_link(self, action):
return reverse('bill_detail', args=(action.bill.slug,))

def item_pubdate(self, action):
return action.date
return action.date_dt

def description(self, obj):
return "Actions for committee %s" % obj.name
Expand Down Expand Up @@ -223,7 +223,7 @@ def item_link(self, action):
return reverse('bill_detail', args=(action.bill.slug,))

def item_pubdate(self, action):
return action.date
return action.date_dt

def description(self, obj):
return "Actions for bill %s" % obj.friendly_name
Expand Down
31 changes: 11 additions & 20 deletions councilmatic_core/haystack_indexes.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,22 @@
from councilmatic_core.models import Bill
from haystack import indexes
from councilmatic_core.templatetags.extras import clean_html

# XXX: is it OK to link to Django settings in haystack_indexes.py ?
from django.conf import settings
import pytz
app_timezone = pytz.timezone(settings.TIME_ZONE)
from councilmatic_core.models import Bill
from councilmatic_core.templatetags.extras import clean_html


class BillIndex(indexes.SearchIndex):

text = indexes.CharField(document=True, use_template=True,
template_name="search/indexes/councilmatic_core/bill_text.txt")
text = indexes.CharField(document=True,
use_template=True,
template_name='search/indexes/councilmatic_core/bill_text.txt')
slug = indexes.CharField(model_attr='slug', indexed=False)
ocd_id = indexes.CharField(model_attr='ocd_id', indexed=False)
id = indexes.CharField(model_attr='id', indexed=False)
bill_type = indexes.CharField(faceted=True)
identifier = indexes.CharField(model_attr='identifier')
description = indexes.CharField(model_attr='description', boost=1.25)
source_url = indexes.CharField(model_attr='source_url', indexed=False)
source_note = indexes.CharField(model_attr='source_note')
abstract = indexes.CharField(model_attr='abstract', boost=1.25, default='')
description = indexes.CharField(model_attr='title', boost=1.25)
source_url = indexes.CharField(model_attr='sources__url', indexed=False)
source_note = indexes.CharField(model_attr='sources__note')
abstract = indexes.CharField(model_attr='abstracts__abstract', boost=1.25, default='')

friendly_name = indexes.CharField()
sort_name = indexes.CharField()
Expand Down Expand Up @@ -57,17 +54,11 @@ def prepare_controlling_body(self, obj):
def prepare_full_text(self, obj):
return clean_html(obj.full_text)

def prepare_last_action_date(self, obj):
from datetime import datetime, timedelta
if not obj.last_action_date:
return datetime.now().replace(tzinfo=app_timezone) - timedelta(days=36500)
return obj.last_action_date

def prepare_inferred_status(self, obj):
return obj.inferred_status

def prepare_legislative_session(self, obj):
return obj._legislative_session.identifier
return obj.legislative_session.identifier

def prepare_ocr_full_text(self, obj):
return clean_html(obj.ocr_full_text)
Expand Down
63 changes: 26 additions & 37 deletions councilmatic_core/management/commands/convert_attachment_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
import logging.config
import sqlalchemy as sa
import requests
import textract
import tempfile
import itertools

from django.core.management.base import BaseCommand
from django.conf import settings
from django.db.models import Max
from django.db.models import Max, Q

from opencivicdata.legislative.models import BillDocumentLink
from councilmatic_core.models import BillDocument

logging.config.dictConfig(settings.LOGGING)
Expand All @@ -25,7 +25,7 @@

class Command(BaseCommand):
help = 'Converts bill attachments into plain text'

def add_arguments(self, parser):
parser.add_argument(
'--update_all',
Expand All @@ -38,41 +38,32 @@ def handle(self, *args, **options):
self.add_plain_text()

def get_document_url(self):
with engine.begin() as connection:
# Only apply this query to most recently updated (or created) bill documents.
max_updated = BillDocument.objects.all().aggregate(Max('updated_at'))['updated_at__max']

if max_updated is None or self.update_all:
query = '''
SELECT id, url
FROM councilmatic_core_billdocument
WHERE document_type='A'
AND full_text is null
AND lower(url) similar to '%(.doc|.docx|.pdf)'
ORDER BY updated_at DESC
'''
else:
query = '''
SELECT id, url
FROM councilmatic_core_billdocument
WHERE updated_at >= :max_updated
AND document_type='A'
AND full_text is null
AND lower(url) similar to '%(.doc|.docx|.pdf)'
ORDER BY updated_at DESC
'''
# Only apply this query to most recently updated (or created) bill documents.
max_updated = BillDocument.objects.all().aggregate(max_updated_at=Max('bill__updated_at'))['max_updated_at']

is_null = Q(document__councilmatic_document__full_text__isnull=True)
is_file = Q(url__iendswith='pdf') | Q(url__iendswith='docx') | Q(url__iendswith='docx')
after_max_update = Q(document__bill__updated_at__gt=max_updated)

result = connection.execution_options(stream_results=True).execute(sa.text(query), max_updated=max_updated)
if max_updated is None or self.update_all:
qs = BillDocumentLink.objects.filter(is_null & is_file)
else:
qs = BillDocumentLink.objects.filter(is_null & is_file & after_max_update)

yield from result
for item in qs:
yield item.url, item.document.id

def convert_document_to_plaintext(self):
# textract is a heavy dependency. In order to test this code without
# installing it, import the library here.
import textract
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this import here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can run the tests w/o needing to install textract, which is a pita. https://textract.readthedocs.io/en/stable/installation.html / #193 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment to that effect.


for document_data in self.get_document_url():
document_data = dict(document_data)
url = document_data['url']
document_id = document_data['id']
response = requests.get(url)
# Sometimes, Metro Legistar has a URL that retuns a bad status code (e.g., 404 from http://metro.legistar1.com/metro/attachments/95d5007e-720b-4cdd-9494-c800392b9265.pdf).
# Sometimes, Metro Legistar has a URL that retuns a bad status code (e.g., 404 from http://metro.legistar1.com/metro/attachments/95d5007e-720b-4cdd-9494-c800392b9265.pdf).
# Skip these documents.
if response.status_code != 200:
logger.error('Document URL {} returns {} - Could not get attachment text!'.format(url, response.status_code))
Expand All @@ -86,31 +77,29 @@ def convert_document_to_plaintext(self):
try:
plain_text = textract.process(tfp.name)
except textract.exceptions.ShellError as e:
logger.error('{} - Could not convert Document ID {}!'.format(e, document_id))
logger.error('{} - Could not convert Councilmatic Document ID {}!'.format(e, document_id))
continue

logger.info('Document ID {} - conversion complete'.format(document_id))
logger.info('Councilmatic Document ID {} - conversion complete'.format(document_id))

yield {'plain_text': plain_text.decode('utf-8'), 'id': document_id}


def add_plain_text(self):
'''
Metro has over 2,000 attachments that should be converted into plain text.
Metro has over 2,000 attachments that should be converted into plain text.
When updating all documents with `--update_all`, this function insures that the database updates only 20 documents per connection (mainly, to avoid unexpected memory consumption).
It fetches up to 20 elements from a generator object, runs the UPDATE query, and then fetches up to 20 more.

Inspired by: https://stackoverflow.com/questions/30510593/how-can-i-use-server-side-cursors-with-django-and-psycopg2/41088159#41088159

More often, this script updates just a handful of documents: so, the incremental, fetch-just-20 approach may prove unnecessary. Possible refactor?
'''

update_statement = '''
UPDATE councilmatic_core_billdocument as bill_docs
UPDATE councilmatic_core_billdocument AS bill_docs
SET full_text = :plain_text
WHERE bill_docs.id = :id
WHERE bill_docs.document_id = :id
'''

plaintexts = self.convert_document_to_plaintext()

while True:
Expand Down
Loading