Database operations with python

The SQLAlchemy is a python library for efficient database access. It abstracts away many of the common differences between different SQL databases.

pandas has read_sql and DataFrame.to_sql functions which enable easy read/ write of SQL tables from SQLAlchemy connection.

import sqlalchemy as db
import pandas as pd

Connect with SQL database

# PyMySQL is a python connection driver for MySQL
!pip3 install PyMySQL

create_engine returns an instance of Engine, and it represents the core interface to the database, adapted through a dialect. On first time call of Engine.execute() or Engine.connect() Engine establishes a real connection to the database.

# The typical form of a database URL is: dialect+driver://username:password@host:port/database

engine = db.create_engine('mysql+pymysql://username:password@localhost:3306/mydb')
engine.connect()

For now our database has single table.

# Print the table names
engine.table_names()

['users']

Pandas DataFrame

Create a dataframe with simple data

columns = ['country', 'continent', 'area', 'population', 'gdp']

data = [('Afghanistan', 'Asia', 647500, 31056997, 700),
       ('India', 'Asia', 3287590, 1095351995, 2900)]

df = pd.DataFrame(data=data, columns=columns)
df

	country	continent	area	population	gdp
0	Afghanistan	Asia	647500	31056997	700
1	India	Asia	3287590	1095351995	2900

Creata table in SQL database

Write all records stored in dataframe to SQL database table named country

df.to_sql('country', con=engine, 
          index=False # Do not want DataFrame index as a table column
         )

We can check if table is created in database

engine.table_names()

['country', 'users']

Read SQL table

We can read the SQL database table conveniently with read_sql function. It returns the table in DataFrame object.

pd.read_sql("SELECT * FROM country", engine)

	country	continent	area	population	gdp
0	Afghanistan	Asia	647500	31056997	700
1	India	Asia	3287590	1095351995	2900

Append to SQL table

# create another dataframe with some more nation's entry

more_data = [('China', 'Asia', 9596960, 1313973713, 5000),
       ('United States', 'North America', 9631420, 298444215, 37800),
       ('Brazil', 'South America', 8511965, 188078227, 7600),
       ('South Africa', 'Africa', 1219912, 44187637, 10700),
       ('France', 'Europe', 547030,	60876136, 27600)]

df1 = pd.DataFrame(data=more_data, columns=columns)

Append the data in database table

df1.to_sql('country', con=engine, 
           index=False,
           if_exists='append' # the SQL table gets appended
           )

pd.read_sql("SELECT * FROM country", engine)

	country	continent	area	population	gdp
0	Afghanistan	Asia	647500	31056997	700
1	India	Asia	3287590	1095351995	2900
2	China	Asia	9596960	1313973713	5000
3	United States	North America	9631420	298444215	37800
4	Brazil	South America	8511965	188078227	7600
5	South Africa	Africa	1219912	44187637	10700
6	France	Europe	547030	60876136	27600

Delete records from SQL table

drop_query = """DELETE FROM country 
                WHERE country = 'China'
                """

Execute a SQL statement construct.

engine.execute(drop_query)

pd.read_sql("SELECT * FROM country", engine)

	country	continent	area	population	gdp
0	Afghanistan	Asia	647500	31056997	700
1	India	Asia	3287590	1095351995	2900
2	United States	North America	9631420	298444215	37800
3	Brazil	South America	8511965	188078227	7600
4	South Africa	Africa	1219912	44187637	10700
5	France	Europe	547030	60876136	27600

Drop table

You can completely drop the table.

drop_query = "DROP TABLE country"
engine.execute(drop_query);

engine.table_names()

['users']