Installation Guide

This guide will walk you through the process of installing the latest official release of django-calaccess-processed-data so that you can incorporate CAL-ACCESS data into your own Django project.

If, instead, you want to install the raw source code or contribute as a developer please refer to the “How to contribute” tutorial.

Warning

This library is intended to be plugged into a project created with the Django web framework. Before you can begin, you’ll need to have one up and running. If you don’t know how, check out the official Django documentation.


Installing the Django apps

The latest version of the application can be installed from the Python Package Index using pip.

$ pip install django-calaccess-processed-data

Like most Django applications, the app then needs to be added to the INSTALLED_APPS in your settings.py configuration file. You also need to include other Django apps it depends on:

INSTALLED_APPS = (
    # ... other apps up here ...
    'calaccess_raw',
    'calaccess_scraped',
    'calaccess_processed',
    'opencivicdata.core.apps.BaseConfig',
    'opencivicdata.elections.apps.BaseConfig',
)

A little more about these dependencies:

calaccess_raw
This app downloads and extracts the raw data files exported each night from the CAL-ACCESS database. The app then loads these files into your Django project’s database with minimal transformations. For more details, see the django-calaccess-raw-data section.
calaccess_scraped
This app scrapes the CAL-ACCESS website and loads additional data not included in the nightly exports. For more details, see the django-calaccess-scraped-data section.
opencivicdata.core
This app includes Django models and admin panels for the core data types of the Open Civic Data specification, including Person, Organization, Post and Membership.
opencivicdata.elections
This app includes Django models and admins panels for election-related data types that have been provisionally included in the Open Civic Data specification.

Connecting to a local database

Also in the settings.py file, you will need to configure Django so it can connect to your database.

Note

Unlike a typical Django project, this application only supports PostgreSQL database backends. This is because we enlist specialized tools to load the immense amount of source data more quickly than Django typically allows. We haven’t developed those routines for SQLite and the other Django backends yet, but we might someday.

Before you begin, make sure you have a PostgreSQL server installed. If you don’t, now is the time to hit Google and figure out how. The official PostgreSQL documentation is another good place to start.

Once that’s handled, add a database connection string like this to your settings.py.

DATABASES = {
    'default': {
        'NAME': 'calaccess_processed',
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'USER': 'your-username-here',
        'PASSWORD': 'your-password-here',
        'HOST': 'localhost',
        'PORT': '5432'
    }
}

Return to the command line. This will create a PostgreSQL database to store the data.

$ createdb calaccess_processed

Note

If you’d prefer to load the CAL-ACCESS outside your default database, check out our guide to working with Django’s system for multiple databases.


Loading the data

Now you’re ready to create the database tables with Django using its manage.py utility belt.

$ python manage.py migrate

Once everything is set up, the updatecalaccessrawdata command will download the latest bulk data release from the Secretary of State’s website and load it into your location database.

$ python manage.py updatecalaccessrawdata

Warning

This will take an hour or more. Go grab some coffee.

Because the nightly raw export is incomplete, we have to scrape additional data from the CAL-ACCESS website. Use the scrapecalaccess command to kick off this process, either after updatecalaccessrawdata finishes or in a separate terminal window:

$ python manage.py scrapecalaccess

Once the raw CAL-ACCESS data is loaded and the scrape has finished, you can transform all this messy data and load into a more simplified structure with the processcalaccessdata command:

$ python manage.py processcalaccessdata