Guided Project: Preparing data for SQLite

Posted on Wed 08 July 2015 in Projects

Introduction to the data

In [ ]:
import pandas as pd
df = pd.read_csv("academy_awards.csv", encoding="ISO-8859-1")
df

Filtering the data

In [ ]:
df["Year"] = df["Year"].str[0:4]
df["Year"] = df["Year"].astype("int64")
later_than_2000 = df[df["Year"] > 2000]
award_categories = ["Actor -- Leading Role","Actor -- Supporting Role", "Actress -- Leading Role", "Actress -- Supporting Role"]
nominations = later_than_2000[later_than_2000["Category"].isin(award_categories)]

Cleaning up the Won? and Unnamed columns

In [ ]:
replacements = { "NO": 0, "YES": 1 }
nominations["Won?"] = nominations["Won?"].map(replacements)
nominations["Won"] = nominations["Won?"]
drop_cols = ["Won?","Unnamed: 5", "Unnamed: 6","Unnamed: 7", "Unnamed: 8", "Unnamed: 9", "Unnamed: 10"]
final_nominations = nominations.drop(drop_cols, axis=1)

Cleaning up the Additional Info column

In [ ]:
additional_info_one = final_nominations["Additional Info"].str.rstrip("'}")
additional_info_two = additional_info_one.str.split(" {'")
movie_names = additional_info_two.str[0]
characters = additional_info_two.str[1]
final_nominations["Movie"] = movie_names
final_nominations["Character"] = characters
final_nominations = final_nominations.drop("Additional Info", axis=1)
final_nominations

Exporting to SQLite

In [ ]:
import sqlite3
conn = sqlite3.connect("nominations.db")
final_nominations.to_sql("nominations", conn, index=False)

Verifying in SQL

In [ ]:
query_one = "pragma table_info(nominations);"
query_two = "select * from nominations limit 10;"
print(conn.execute(query_one).fetchall())
print(conn.execute(query_two).fetchall())
conn.close()