MENU

Coursera | Using Databases with Python: 习题整合汇总

• May 10, 2020 • 学习

SQL我一直都不太懂,这次课好多题都是混过去的。想着还是记录一下,就当是做笔记了。

Week2:Counting Organizations

题目:

This application will read the mailbox data (mbox.txt) and count the number of email messages per organization (i.e. domain name of the email address) using a database with the following schema to maintain the counts.
CREATE TABLE Counts (org TEXT, count INTEGER)
When you have run the program on mbox.txt upload the resulting database file above for grading.
If you run the program multiple times in testing or with dfferent files, make sure to empty out the data before each run.

写一个计数程序,算是送分题。有一个小坑就是计数的是域名邮箱后缀而不是邮箱,所以@字符前面的用户名不要引入计数。我的代码时用split()函数,如果用正则应该会更简单一些。
答案:

import sqlite3

#创建该文件
conn = sqlite3.connect('text.sqlite')
cur = conn.cursor()

cur.execute('DROP TABLE IF EXISTS Counts')

cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')

fname = input('Enter file name: ')
if (len(fname) < 1): fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
    if not line.startswith('From: '): continue
    pieces = line.split()
    email = pieces[1]
    group = email.split('@')
    orgg = group[1]
    cur.execute('SELECT count FROM Counts WHERE org = ? ', (orgg,))
    row = cur.fetchone()
    if row is None:
        cur.execute('''INSERT INTO Counts (org, count)
                VALUES (?, 1)''', (orgg,))
    else:
        cur.execute('UPDATE Counts SET count = count + 1 WHERE org = ?',
                    (orgg,))
    conn.commit()

sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC LIMIT 10'

for row in cur.execute(sqlstr):
    print(str(row[0]), row[1])

cur.close()

Week3:Musical Track Database

题目:

This application will read an iTunes export file in XML and produce a properly normalized database with this structure:

CREATE TABLE Artist (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    name    TEXT UNIQUE
);

CREATE TABLE Genre (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    name    TEXT UNIQUE
);

CREATE TABLE Album (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    artist_id  INTEGER,
    title   TEXT UNIQUE
);

CREATE TABLE Track (
    id  INTEGER NOT NULL PRIMARY KEY 
        AUTOINCREMENT UNIQUE,
    title TEXT  UNIQUE,
    album_id  INTEGER,
    genre_id  INTEGER,
    len INTEGER, rating INTEGER, count INTEGER
);

If you run the program multiple times in testing or with different files, make sure to empty out the data before each run.
You can use this code as a starting point for your application: http://www.py4e.com/code3/tracks.zip. The ZIP file contains the Library.xml file to be used for this assignment. You can export your own tracks from iTunes and create a database, but for the database that you turn in for this assignment, only use the Library.xml data that is provided.
To grade this assignment, the program will run a query like this on your uploaded database and look for the data it expects to see:

SELECT Track.title, Artist.name, Album.title, Genre.name 
    FROM Track JOIN Genre JOIN Album JOIN Artist 
    ON Track.genre_id = Genre.ID and Track.album_id = Album.id 
        AND Album.artist_id = Artist.id
    ORDER BY Artist.name LIMIT 3

The expected result of the modified query on your database is: (shown here as a simple HTML table with titles)

TrackArtistAlbumGenre
Chase the AceAC/DCWho Made WhoRock
D.T.AC/DCWho Made WhoRock
For Those About To Rock (We Salute You)AC/DCWho Made WhoRock

这道题就是执行一个多表查询,常规套路但是代码太长了,写得心态炸裂。最终我生成的表跟实例有一定区别,但是我觉得应该是没问题的,助教也给我打了满分。
答案:

import sqlite3
import xml.etree.ElementTree as ET
conn = sqlite3.connect('music.sqlite')
cur = conn.cursor()
cur.executescript('''
DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS Genre;
DROP TABLE IF EXISTS Album;
DROP TABLE IF EXISTS Track;''') 
#如果存在就先删除表
# 执行多条查询需要 executescript

cur.executescript('''CREATE TABLE Artist (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    name    TEXT UNIQUE
);
CREATE TABLE Genre (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    name    TEXT UNIQUE
);
CREATE TABLE Album (
    id  INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    artist_id  INTEGER,
    title   TEXT UNIQUE
);
CREATE TABLE Track (
    id  INTEGER NOT NULL PRIMARY KEY 
        AUTOINCREMENT UNIQUE,
    title TEXT  UNIQUE,
    album_id  INTEGER,
    genre_id  INTEGER,
    len INTEGER, rating INTEGER, count INTEGER
);''')  

# 执行多行时注意使用三个引号
file_name = 'Library.xml'
fh = open(file_name)
tree = ET.fromstring(fh.read())
dicts = tree.findall('./dict/dict/dict')

# 查看xml文件,分析目录树结构 需要在/dict/dict/dict内查找字段
# 定义查找函数
def lookup(pare, aim):
    flag = False
    for child in pare:
        if flag:
            return child.text
        if child.tag == 'key' and child.text == aim:
            flag = True
    return None

# 通过构建的数据库情况,需要查找的字段为
# Track ID   Name    Artist     Album   Play Count  Rating  Total Time
for entry in dicts:
    if(lookup(entry, 'Track ID') is None):  continue
    name = lookup(entry, 'Name')
    artist = lookup(entry, 'Artist')
    album = lookup(entry, 'Album')
    count = lookup(entry, 'Play Count')
    rating = lookup(entry, 'Rating')
    length = lookup(entry, 'Total Time')
    genre = lookup(entry, 'Genre')
    if name is None or artist is None or album is None or genre is None:
        continue
    print(name, artist, album, count, rating, length)
    # 根据建立数据库的顺序进行插入,并获取外键对应的id值
    # 注意对于唯一元素插入时使用ignore
    cur.execute('''INSERT OR IGNORE INTO Artist(name) VALUES(?)''', (artist,))
    cur.execute('SELECT id FROM Artist WHERE name = ?', (artist,))
    artist_id = cur.fetchone()[0]   # 查询返回的是数组
    cur.execute('INSERT OR IGNORE INTO Genre(name) VALUES(?)', (genre,))
    cur.execute('SELECT id FROM Genre WHERE name = ?',(genre,))
    genre_id = cur.fetchone()[0]
    cur.execute('INSERT OR IGNORE INTO Album(artist_id, title) VALUES(?, ?)',(artist_id, album))
    cur.execute('SELECT id FROM Album WHERE title = ?', (album,))
    album_id = cur.fetchone()[0]
    
    # 至于这里为什么要用replace原因不知
    cur.execute('''INSERT OR REPLACE INTO Track(title, album_id, genre_id, len, rating, count) 
    VALUES(?, ?, ?, ?, ?, ?)''',(name, album_id, genre_id, length, rating, count))
conn.commit()
cur.execute('''SELECT Track.title, Artist.name, Album.title, Genre.name
    FROM Track JOIN Genre JOIN Album JOIN Artist
    ON Track.genre_id = Genre.ID and Track.album_id = Album.id
        AND Album.artist_id = Artist.id
    ORDER BY Artist.name LIMIT 3''')
res = cur.fetchall()
for line in res:
    print(line)
cur.close()

Week4:Many Students in Many Courses

题目:

This application will read roster data in JSON format, parse the file, and then produce an SQLite database that contains a User, Course, and Member table and populate the tables from the data file.
You can base your solution on this code: http://www.py4e.com/code3/roster/roster.py - this code is incomplete as you need to modify the program to store the role column in the Member table to complete the assignment.
Each student gets their own file for the assignment. Download this file and save it as roster_data.json. Move the downloaded file into the same folder as your roster.py program.
Once you have made the necessary changes to the program and it has been run successfully reading the above JSON data, run the following SQL command:

SELECT hex(User.name || Course.title || Member.role ) AS X FROM 
    User JOIN Member JOIN Course 
    ON User.id = Member.user_id AND Member.course_id = Course.id
    ORDER BY X

Find the first row in the resulting record set and enter the long string that looks like 53656C696E613333.

这个是一个多对多实例,不算太难。主要还是集中在SQL代码的理解上,还有在Python对数据库的灵活运用。数据库题目有一个特点就是代码太长了,很多事后不看示例代码总会出现一些遗漏的情况,我还是太菜了。
答案:

import json
import sqlite3
conn = sqlite3.connect('rosterdb.sqlite')
cur = conn.cursor()
# 建立数据库
cur.executescript('''
DROP TABLE IF EXISTS User;
DROP TABLE IF EXISTS Member;
DROP TABLE IF EXISTS Course;
CREATE TABLE User (
    id     INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    name   TEXT UNIQUE
);
CREATE TABLE Course (
    id     INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    title  TEXT UNIQUE
);
CREATE TABLE Member (
    user_id     INTEGER,
    course_id   INTEGER,
    role        INTEGER,
    PRIMARY KEY (user_id, course_id)
)
''')
file_name = input('Enter filename: ')
if(len(file_name) < 1):
    file_name = 'roster_data.json'
fh = open(file_name).read()
js = json.loads(fh)
for record in js:
    name = record[0]
    title = record[1]
    role = record[2]
    cur.execute('INSERT OR IGNORE INTO User(name) VALUES(?)', (name,))
    cur.execute('SELECT id FROM User WHERE name = ?', (name,))
    user_id = cur.fetchone()[0]
    cur.execute('INSERT OR IGNORE INTO Course(title) VALUES(?)', (title,))
    cur.execute('SELECT id FROM Course WHERE title = ?', (title,))
    course_id = cur.fetchone()[0]
    cur.execute('INSERT OR REPLACE INTO Member(user_id, course_id, role) VALUES(?,?,?)', (user_id, course_id, role))
cur.execute('''SELECT hex(User.name || Course.title || Member.role ) AS X FROM 
    User JOIN Member JOIN Course 
    ON User.id = Member.user_id AND Member.course_id = Course.id
    ORDER BY X LIMIT 5''')
res = cur.fetchall()
for line in res:
    print(line)
conn.commit()
cur.close()
Archives QR Code
QR Code for this page
Tipping QR Code