Ballot Image Log Database

Jacob Coblentz and Shiro Kuriwaki

2019-01-23

Congrats on getting credentials! This document will explain how to connect to our database and how to use it effectively.

Accessing the Database Directly

The database is hosted through Amazon Web Services (AWS) RDS. RDS is a type of AWS product that is strictly for relational databases. We are also construcitng an AWS EC2 account that alsao links to the RDS. For now, as we transition, it is easier to connect to the RDS directly. To access, users need the following information:

Item Value
host url ballot.cppysy51d7px.us-east-1.rds.amazonaws.com
port to access 5432
name of datbase ballot
username [given]
password [given]

Accessing the Database from R

Common packages

Option 1: ODBC (versatile but takes setup costs)

Next, we will set up a connection object. You will need a username (using kuriwaki as one example) and password. R will prompt your for your password.

The ODBC interface is a connection to drivers that faciliates users to connect to any type of database (Postgres, Redshift, etc..). This setup is recommended by RStudio, but it requires you to have a PostgreSQL driver set up via ODBC (https://db.rstudio.com/best-practices/drivers/). If you do not, you can use the RpostgreSQL below package instead of odbc.

Accessing the Database

This databse currently has two tables - "vote" (each observation is a vote for an office), and "person" (each observation is a voter). Let’s explore the databse. We will use the dbplyr package that allows execution of dplyr verbs on a remote database.

To retrive a table from a database, we use the "tbl" command:

If using odbc, you can use dbplyr’s in_schema syntax

This returns an object that can be used much like a data frame. In this example below, we count the number of votes for each candidate in the 2016 Republican Presidential Primary.

This will give :

Tables can be combined for custom browsing:

Commands that compute the whole dataset take a lot of time to compute. For example, tally and count should be used sparingly. This takes some time to run.

Here, we see that Obama got more votes than Romney in 2012, Charleston:

Accessing the Database from the Terminal

If not via R, you can directly connect from the Terminal. For example, for the username kuriwaki,

you need postgresql program and its proper path to use this command.

Once connected, you can look at tables from the Terminal.

The \d command in Postgres will display the list of tables in the dataset.

And you can look up tables from SQL syntax. For example, this takes all columns from the vote table, limiting to just ten rows.

Accessing the Server

If you’ve given a username for the server

Item Value
host url ec2-35-171-26-70.compute-1.amazonaws.com
username [given]
password [given]

To get into the EC2 server, you need a separate user that has access to PEM file issued by AWS. Suppose your PEM file is located is called ec2aslett.pem. Then, moving to the directory where that pem file is located, you can ssh in by noting the EC2 server.

To login in the terminal with a PEM file:

ssh -i ec2aslett.pem kuriwaki@ec2-35-171-26-70.compute-1.amazonaws.com

Working without a PEM could work:

ssh kuriwaki@ec2-35-171-26-70.compute-1.amazonaws.com

This EC2 server also has a Rstudio Server wrapped around it (http://louisaslett.com/RStudio_AMI/). In this setup, the EC2 server leads directly to a RStudio IDE:

  1. Go to http://ec2-35-171-26-70.compute-1.amazonaws.com/
  2. Enter your EC2 username
  3. Enter your EC2 password