Here we will create a SQL table from external kaggle dataset of Nobel Laureates. It is available in csv format.

Connect with SQL database

import sqlalchemy as db
import pandas as pd
# Create an engine to the `test` database
# The typical form of a database URL is `dialect+driver://username:password@host:port/database`
engine = db.create_engine('mysql+pymysql://root:root@localhost:3306/test')
# Print the table names
engine.table_names()
['student_data', 'world']

Handy function for future use.

def ex_df(query):
    """Execute query and return result in pandas.DataFrame"""
    data = engine.execute(query).fetchall()
    cols = [x[0] for x in engine.execute(query).cursor.description]
    return pd.DataFrame(data=data, columns=cols)

Create table from external csv

Read csv file

df_nobel = pd.read_csv("./nobel_laureates/archive.csv")
df_nobel.head(2)
Year Category Prize Motivation Prize Share Laureate ID Laureate Type Full Name Birth Date Birth City Birth Country Sex Organization Name Organization City Organization Country Death Date Death City Death Country
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male NaN NaN NaN 1907-09-07 Châtenay France

The dataset has column names with space. It would be difficult to refer such columns in sql. Hence remove those space and keep all columns in lowercase.

df_nobel.columns = [s.replace(' ', '_').lower() for s in df_nobel.columns.tolist()]

Create a SQL tabel with name nobel

df_nobel.to_sql('nobel', engine, index=False)
engine.table_names()
['nobel', 'student_data', 'world']

View top two records

SELECT * 
FROM   nobel 
LIMIT  2
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city birth_country sex organization_name organization_city organization_country death_date death_city death_country
0 1901 Chemistry The Nobel Prize in Chemistry 1901 "in recognition of the extraordinary services ... 1/1 160 Individual Jacobus Henricus van 't Hoff 1852-08-30 Rotterdam Netherlands Male Berlin University Berlin Germany 1911-03-01 Berlin Germany
1 1901 Literature The Nobel Prize in Literature 1901 "in special recognition of his poetic composit... 1/1 569 Individual Sully Prudhomme 1839-03-16 Paris France Male None None None 1907-09-07 Châtenay France

Change data type of date columns to DATE

ALTER TABLE nobel 
  MODIFY birth_date DATE;
  
ALTER TABLE nobel
  MODIFY death_date DATE;

Check nobel table columns and their details

DESCRIBE nobel
Field Type Null Key Default Extra
0 year year(4) YES None
1 category text YES None
2 prize text YES None
3 motivation text YES None
4 prize_share text YES None
5 laureate_id bigint(20) YES None
6 laureate_type text YES None
7 full_name text YES None
8 birth_date date YES None
9 birth_city text YES None
10 birth_country text YES None
11 sex text YES None
12 organization_name text YES None
13 organization_city text YES None
14 organization_country text YES None
15 death_date date YES None
16 death_city text YES None
17 death_country text YES None

View last two records of table.

SELECT * 
FROM   nobel 
ORDER  BY year DESC 
LIMIT  2 
year category prize motivation prize_share laureate_id laureate_type full_name birth_date birth_city birth_country sex organization_name organization_city organization_country death_date death_city death_country
0 2016 Chemistry The Nobel Prize in Chemistry 2016 "for the design and synthesis of molecular mac... 1/3 931 Individual Jean-Pierre Sauvage 1944-10-21 Paris France Male University of Strasbourg Strasbourg France None None None
1 2016 Chemistry The Nobel Prize in Chemistry 2016 "for the design and synthesis of molecular mac... 1/3 932 Individual Sir J. Fraser Stoddart 1942-05-24 Edinburgh United Kingdom Male Northwestern University Evanston, IL United States of America None None None

Who are the living nobel laureate

There are some records with early 19 with death_date as none. Seem like it is incomplete information as those records even does not have birth_date. So I will consider the laureate as living if they have birth entry but not death entry.

SELECT year, 
       full_name, 
       birth_date, 
       ROUND(DATEDIFF(NOW(), birth_date) / 365.25, 0) AS Age 
FROM   nobel 
WHERE  birth_date IS NOT NULL 
       AND death_date IS NULL 
ORDER  BY age 
year full_name birth_date Age
0 2001 A. Michael Spence 1943-00-00 None
1 2014 Malala Yousafzai 1997-07-12 22
2 2011 Tawakkol Karman 1979-02-07 40
3 2010 Konstantin Novoselov 1974-08-23 45
4 2011 Leymah Gbowee 1972-02-01 47
5 2011 Adam G. Riess 1969-12-16 49
6 2011 Adam G. Riess 1969-12-16 49
7 2011 Brian P. Schmidt 1967-02-24 52
8 2014 Stefan W. Hell 1962-12-23 56
9 2014 Stefan W. Hell 1962-12-23 56
10 2014 May-Britt Moser 1963-01-04 56
11 2001 Eric A. Cornell 1961-12-19 57
12 2012 Shinya Yamanaka 1962-09-04 57
13 2012 Shinya Yamanaka 1962-09-04 57
14 2014 Edvard I. Moser 1962-04-27 57
15 2006 Craig C. Mello 1960-10-19 58
16 2009 Carol W. Greider 1961-04-15 58
17 2009 Barack H. Obama 1961-08-04 58
18 2014 Eric Betzig 1960-01-13 59
19 2014 Hiroshi Amano 1960-09-11 59
20 1992 Rigoberta Menchú Tum 1959-01-09 60
21 2002 Koichi Tanaka 1959-08-03 60
22 2006 Andrew Z. Fire 1959-04-27 60
23 2010 Andre Geim 1958-10-21 60
24 2015 Takaaki Kajita 1959-03-09 60
25 2001 Wolfgang Ketterle 1957-10-21 61
26 2011 Bruce A. Beutler 1957-12-29 61
27 2011 Bruce A. Beutler 1957-12-29 61
28 2003 Roderick MacKinnon 1956-02-19 63
29 2003 Roderick MacKinnon 1956-02-19 63
... ... ... ... ...
293 1990 Harry M. Markowitz 1927-08-24 92
294 1994 George A. Olah 1927-05-22 92
295 2002 Vernon L. Smith 1927-01-01 92
296 2002 Sydney Brenner 1927-01-13 92
297 1975 Ben Roy Mottelson 1926-07-09 93
298 1980 Paul Berg 1926-06-30 93
299 1982 Aaron Klug 1926-08-11 93
300 2000 Paul Greengard 1925-12-11 93
301 2002 Masatoshi Koshiba 1926-09-19 93
302 1973 Leo Esaki 1925-03-12 94
303 2002 Jimmy Carter 1924-10-01 94
304 2005 Roy J. Glauber 1925-09-01 94
305 1974 Antony Hewish 1924-05-11 95
306 1977 Roger Guillemin 1924-01-11 95
307 1977 Philip Warren Anderson 1923-12-13 95
308 1981 Torsten N. Wiesel 1924-06-03 95
309 1987 Robert M. Solow 1924-08-23 95
310 1973 Henry A. Kissinger 1923-05-27 96
311 1986 Stanley Cohen 1922-11-17 96
312 1992 Rudolph A. Marcus 1923-07-21 96
313 2000 Arvid Carlsson 1923-01-25 96
314 1957 Chen Ning Yang 1922-09-22 97
315 1988 Leon M. Lederman 1922-07-15 97
316 1989 Hans G. Dehmelt 1922-09-09 97
317 1972 Kenneth J. Arrow 1921-08-23 98
318 1988 Jack Steinberger 1921-05-25 98
319 1981 Nicolaas Bloembergen 1920-03-11 99
320 1992 Edmond H. Fischer 1920-04-06 99
321 1997 Jens C. Skou 1918-10-08 100
322 1997 Paul D. Boyer 1918-07-31 101

323 rows × 4 columns

Data correction

Laureate A. Michael Spence has wrong birth date. Lets correct it.

UPDATE nobel 
SET    birth_date = '1943-11-07' 
WHERE  full_name = 'A. Michael Spence' 

Check if it is updated.

SELECT year, 
       full_name, 
       birth_date 
FROM   nobel 
WHERE  full_name = 'A. Michael Spence' 
year full_name birth_date
0 2001 A. Michael Spence 1943-11-07

Also some of the entries are repeated as some laureates got more than one nobel prize.

SELECT DISTINCT year, 
                full_name, 
                birth_date, 
                ROUND(DATEDIFF(NOW(), birth_date)/365.25, 0) AS Age
FROM   nobel 
WHERE  birth_date IS NOT NULL 
       AND death_date IS NULL 
ORDER  BY age DESC 
year full_name birth_date Age
0 1997 Paul D. Boyer 1918-07-31 101
1 1997 Jens C. Skou 1918-10-08 100
2 1981 Nicolaas Bloembergen 1920-03-11 99
3 1992 Edmond H. Fischer 1920-04-06 99
4 1972 Kenneth J. Arrow 1921-08-23 98
5 1988 Jack Steinberger 1921-05-25 98
6 1957 Chen Ning Yang 1922-09-22 97
7 1988 Leon M. Lederman 1922-07-15 97
8 1989 Hans G. Dehmelt 1922-09-09 97
9 1973 Henry A. Kissinger 1923-05-27 96
10 1986 Stanley Cohen 1922-11-17 96
11 1992 Rudolph A. Marcus 1923-07-21 96
12 2000 Arvid Carlsson 1923-01-25 96
13 1974 Antony Hewish 1924-05-11 95
14 1977 Philip Warren Anderson 1923-12-13 95
15 1977 Roger Guillemin 1924-01-11 95
16 1981 Torsten N. Wiesel 1924-06-03 95
17 1987 Robert M. Solow 1924-08-23 95
18 1973 Leo Esaki 1925-03-12 94
19 2002 Jimmy Carter 1924-10-01 94
20 2005 Roy J. Glauber 1925-09-01 94
21 1975 Ben Roy Mottelson 1926-07-09 93
22 1980 Paul Berg 1926-06-30 93
23 1982 Aaron Klug 1926-08-11 93
24 2000 Paul Greengard 1925-12-11 93
25 2002 Masatoshi Koshiba 1926-09-19 93
26 1957 Tsung-Dao (T.D.) Lee 1926-11-24 92
27 1967 Manfred Eigen 1927-05-09 92
28 1977 Andrew V. Schally 1926-11-30 92
29 1987 K. Alexander Müller 1927-04-20 92
... ... ... ... ...
260 2014 Kailash Satyarthi 1954-01-11 65
261 2014 Shuji Nakamura 1954-05-22 65
262 2012 Brian K. Kobilka 1955-05-30 64
263 2012 Mo Yan 1955-02-02 64
264 2003 Roderick MacKinnon 1956-02-19 63
265 2010 Liu Xiaobo 1955-12-28 63
266 2013 Thomas C. Südhof 1955-12-22 63
267 2001 Wolfgang Ketterle 1957-10-21 61
268 2011 Bruce A. Beutler 1957-12-29 61
269 1992 Rigoberta Menchú Tum 1959-01-09 60
270 2002 Koichi Tanaka 1959-08-03 60
271 2006 Andrew Z. Fire 1959-04-27 60
272 2010 Andre Geim 1958-10-21 60
273 2015 Takaaki Kajita 1959-03-09 60
274 2014 Eric Betzig 1960-01-13 59
275 2014 Hiroshi Amano 1960-09-11 59
276 2006 Craig C. Mello 1960-10-19 58
277 2009 Barack H. Obama 1961-08-04 58
278 2009 Carol W. Greider 1961-04-15 58
279 2001 Eric A. Cornell 1961-12-19 57
280 2012 Shinya Yamanaka 1962-09-04 57
281 2014 Edvard I. Moser 1962-04-27 57
282 2014 May-Britt Moser 1963-01-04 56
283 2014 Stefan W. Hell 1962-12-23 56
284 2011 Brian P. Schmidt 1967-02-24 52
285 2011 Adam G. Riess 1969-12-16 49
286 2011 Leymah Gbowee 1972-02-01 47
287 2010 Konstantin Novoselov 1974-08-23 45
288 2011 Tawakkol Karman 1979-02-07 40
289 2014 Malala Yousafzai 1997-07-12 22

290 rows × 4 columns

There are multiple laureates who are no more, but database is not updated. e.g. Paul D. Boyer died on 2 June 2018 wikipedia. Let’s update records for top few.

UPDATE nobel 
SET    death_date = CASE full_name 
                      WHEN 'Paul D. Boyer' THEN '2018-06-02' 
                      WHEN 'Jens C. Skou' THEN '2018-05-28' 
                      WHEN 'Nicolaas Bloembergen' THEN '2017-09-05' 
                      WHEN 'Kenneth J. Arrow' THEN '2017-02-21' 
                      WHEN 'Leon M. Lederman' THEN '2018-10-03' 
                      ELSE death_date 
                    END 
WHERE  full_name IN ( 'Paul D. Boyer', 'Jens C. Skou', 'Nicolaas Bloembergen', 
                      'Kenneth J. Arrow', 'Leon M. Lederman' )

Check if those values got updated

SELECT year, 
       full_name, 
       birth_date, 
       death_date 
FROM   nobel 
WHERE  full_name IN ( 'Paul D. Boyer', 'Jens C. Skou', 'Nicolaas Bloembergen', 
                      'Kenneth J. Arrow', 'Leon M. Lederman' ) 
year full_name birth_date death_date
0 1972 Kenneth J. Arrow 1921-08-23 2017-02-21
1 1981 Nicolaas Bloembergen 1920-03-11 2017-09-05
2 1988 Leon M. Lederman 1922-07-15 2018-10-03
3 1997 Paul D. Boyer 1918-07-31 2018-06-02
4 1997 Jens C. Skou 1918-10-08 2018-05-28
Ideally I should write an utility to scrap wikipedia page for laureate and find if he/she is alive and update the records accrodingly.

Indian Nobel laureates

SELECT full_name, 
       year, 
       category, 
       motivation 
FROM   nobel 
WHERE  birth_country = 'India' 
        OR organization_country = 'India' 
        OR death_country = 'India'
full_name year category motivation
0 Ronald Ross 1902 Medicine "for his work on malaria, by which he has show...
1 Rabindranath Tagore 1913 Literature "because of his profoundly sensitive, fresh an...
2 Sir Chandrasekhara Venkata Raman 1930 Physics "for his work on the scattering of light and f...
3 Har Gobind Khorana 1968 Medicine "for their interpretation of the genetic code ...
4 Mother Teresa 1979 Peace None
5 Amartya Sen 1998 Economics "for his contributions to welfare economics"
6 Venkatraman Ramakrishnan 2009 Chemistry "for studies of the structure and function of ...
7 Kailash Satyarthi 2014 Peace "for their struggle against the suppression of...

Knights of the realm. List the winners, year and category where the winner starts with Sir.

SELECT full_name, 
       year, 
       category 
FROM   nobel 
WHERE  full_name LIKE ( 'Sir%' ) 

References