The SQLAlchemy
is a python library for efficient database access. It abstracts away many of the common differences between different SQL databases.
pandas
has read_sql
and DataFrame.to_sql
functions which enable easy read/ write of SQL tables from SQLAlchemy connection.
import sqlalchemy as db
import pandas as pd
Connect with SQL database
# PyMySQL is a python connection driver for MySQL
!pip3 install PyMySQL
create_engine
returns an instance of Engine
, and it represents the core interface to the database, adapted through a dialect. On first time call of Engine.execute()
or Engine.connect()
Engine establishes a real connection to the database.
# The typical form of a database URL is: dialect+driver://username:password@host:port/database
engine = db.create_engine('mysql+pymysql://username:password@localhost:3306/mydb')
engine.connect()
For now our database has single table.
# Print the table names
engine.table_names()
['users']
Pandas DataFrame
Create a dataframe with simple data
columns = ['country', 'continent', 'area', 'population', 'gdp']
data = [('Afghanistan', 'Asia', 647500, 31056997, 700),
('India', 'Asia', 3287590, 1095351995, 2900)]
df = pd.DataFrame(data=data, columns=columns)
df
country | continent | area | population | gdp | |
---|---|---|---|---|---|
0 | Afghanistan | Asia | 647500 | 31056997 | 700 |
1 | India | Asia | 3287590 | 1095351995 | 2900 |
Creata table in SQL database
Write all records stored in dataframe to SQL database table named country
df.to_sql('country', con=engine,
index=False # Do not want DataFrame index as a table column
)
We can check if table is created in database
engine.table_names()
['country', 'users']
Read SQL table
We can read the SQL database table conveniently with read_sql
function. It returns the table in DataFrame
object.
pd.read_sql("SELECT * FROM country", engine)
country | continent | area | population | gdp | |
---|---|---|---|---|---|
0 | Afghanistan | Asia | 647500 | 31056997 | 700 |
1 | India | Asia | 3287590 | 1095351995 | 2900 |
Append to SQL table
# create another dataframe with some more nation's entry
more_data = [('China', 'Asia', 9596960, 1313973713, 5000),
('United States', 'North America', 9631420, 298444215, 37800),
('Brazil', 'South America', 8511965, 188078227, 7600),
('South Africa', 'Africa', 1219912, 44187637, 10700),
('France', 'Europe', 547030, 60876136, 27600)]
df1 = pd.DataFrame(data=more_data, columns=columns)
Append the data in database table
df1.to_sql('country', con=engine,
index=False,
if_exists='append' # the SQL table gets appended
)
pd.read_sql("SELECT * FROM country", engine)
country | continent | area | population | gdp | |
---|---|---|---|---|---|
0 | Afghanistan | Asia | 647500 | 31056997 | 700 |
1 | India | Asia | 3287590 | 1095351995 | 2900 |
2 | China | Asia | 9596960 | 1313973713 | 5000 |
3 | United States | North America | 9631420 | 298444215 | 37800 |
4 | Brazil | South America | 8511965 | 188078227 | 7600 |
5 | South Africa | Africa | 1219912 | 44187637 | 10700 |
6 | France | Europe | 547030 | 60876136 | 27600 |
Delete records from SQL table
drop_query = """DELETE FROM country
WHERE country = 'China'
"""
Execute a SQL statement construct.
engine.execute(drop_query)
pd.read_sql("SELECT * FROM country", engine)
country | continent | area | population | gdp | |
---|---|---|---|---|---|
0 | Afghanistan | Asia | 647500 | 31056997 | 700 |
1 | India | Asia | 3287590 | 1095351995 | 2900 |
2 | United States | North America | 9631420 | 298444215 | 37800 |
3 | Brazil | South America | 8511965 | 188078227 | 7600 |
4 | South Africa | Africa | 1219912 | 44187637 | 10700 |
5 | France | Europe | 547030 | 60876136 | 27600 |
Drop table
You can completely drop the table.
drop_query = "DROP TABLE country"
engine.execute(drop_query);
engine.table_names()
['users']