A tutorial for SQLAlchemy

[TOC]

ORM Components

  • What is the Object Relational Mapper (ORM) in SQLAlchemy?

    The SQLAlchemy Object Relational Mapper presents a method of associating user-defined Python classes with database tables, and instances of those classes (objects) with rows in their corresponding tables. It includes a system that transparently synchronizes all changes in state between objects and their related rows, called a unit of work, as well as a system for expressing database queries in terms of the user defined classes and their defined relationships between each other.

  • The Object Relational Mapper and Expression Language

    The ORM is in contrast to the SQLAlchemy Expression Language, upon which the ORM is constructed. Whereas the SQL Expression Language presents a system of representing the primitive constructs of the relational database directly without opinion, the ORM presents a high level and abstracted pattern of usage, which itself is an example of applied usage of the Expression Language.

    While there is overlap among the usage patterns of the ORM and the Expression Language, the similarities are more superficial than they may at first appear. One approaches the structure and content of data from the perspective of a user-defined domain model which is transparently persisted and refreshed from its underlying storage model. The other approaches it from the perspective of literal schema and SQL expression representations which are explicitly composed into messages consumed individually by the database.

    A successful application may be constructed using the Object Relational Mapper exclusively. In advanced situations, an application constructed with the ORM may make occasional usage of the Expression Language directly in certain areas where specific database interactions are required.

Engine

An instance of Engine represents the core interface to the database, adapted through a dialect that handles the details of the database and DBAPI in use. When using the ORM, we typically don’t use the Engine directly once created; instead, it’s used behind the scenes by the ORM.

from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:', echo=True)

The first time a method like Engine.execute() or Engine.connect() is called, the Engine establishes a real DBAPI connection to the database, which is then used to emit the SQL.

Session Factory and Object

We’re now ready to start talking to the database. The ORM’s “handle” to the database is the Session. When we first set up the application, at the same level as our create_engine() statement, we define a Session class which will serve as a factory for new Session objects:

from sqlalchemy.orm import sessionmaker

Session = sessionmaker(bind=engine) # a factory class for Session

Then, whenever you need to have a conversation with the database, you instantiate a Session:

session = Session()

The above Session is associated with our SQLite-enabled engine, but it hasn’t opened any connections yet. When it’s first used, it retrieves a connection from a pool of connections maintained by the Engine, and holds onto it until we commit all changes and/or close the session object.

Declarative Base

When using ORM frameworks, the configurational process starts by describing the database tables we’ll be dealing with, and then by defining user-defined classes which will be mapped to those tables.

In modern SQLAlchemy, these two tasks are usually performed together, using a system known as Declarative, which allows us to create classes that include directives to describe the actual database table they will be mapped to.

Classes mapped using the Declarative system are defined in terms of a base class which maintains a catalog of classes and tables relative to that base - this is known as the declarative base class. Our application will usually have just one instance of this base in a commonly imported module.

from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

Mapped Class

We can define any number of mapped classes inherit from declarative base class. For example, if we want interactive with database table users, we can define a mapped class as follows:

from sqlalchemy import Column, Integer, String

class User(Base):

	__tablename__ = 'users'
	id = Column(Integer, primary_key=True)
    username = Column(String)
    password = Column(String)

Within the mapped class, we define details about the database table to which we’ll be mapping, primarily the table name, and names and data types of columns, known as table metadata.

A class using Declarative at a minimum needs a __tablename__ attribute, and at least one Column which is part of a primary key.

When we declared our class, Declarative used a Python metaclass in order to perform additional activities once the class declaration was complete; Declarative replaces all the Column objects with special Python accessors known as descriptors; this is a process known as instrumentation. The “instrumented” mapped class will provide us with the means to refer to our table in a SQL context as well as to persist and load the values of columns from the database. You can check via type(User.username).

Instances of Mapped Class

An instance of mapped class can be used to represent a row in their corresponding database table. And there is a system that transparently synchronizes all changes in state between objects and their related rows, as well as a system for expressing database queries in terms of the user defined classes and their defined relationships between each other. All of those operations are running behind-scenes, we don’t care about them.

Table and Mapper Object

With our User class defined via the Declarative system, we have defined information about our table, known as table metadata. The object used by SQLAlchemy to represent this information for a specific table is called the Table object, and here Declarative has made one for us. We can see this object by User.__table__.

This Table object is made by Declarative via a Python metaclass to perform additional activities once the class declaration was complete. Within this phase, it then created a Table object according to our specifications, and associated it with the class by constructing a Mapper object. The Mapper object defines the correlation of mapped class attributes to database table columns. This object is a behind-the-scenes object we normally don’t need to deal with directly. We can see this object by User.__mapper__.

MetaData Object

MetaData object is a collection of Table objects and their associated schema constructs. When using Declarative, this object is available using the .metadata attribute of our declarative base class, i.e. Base.metadata.

MetaData object holds a collection of Table objects as well as an optional binding to an Engine or Connection. If bound, the Table objects in the collection and their columns may participate in implicit SQL execution. And the Table objects themselves are stored in the MetaData.tables dictionary. When using Declarative, the mapped class’s Table is automatically added into the Base.metadata collection. Thus the object User.__table__ is a member of the Base.metadata object.

The Base.metabase is a registry which includes the ability to emit a limited set of schema generation commands to the database. As our SQLite database does not actually have a users table present, we can use it to issue CREATE TABLE statements to the database for all tables that don’t yet exist:

Base.metadata.create_all(engine)

CRUD

Adding, Updating and Deleting Objects

Adding

Add objects to session:

ed_user = User(username='Ed Jones', password='edspassword')
session.add(ed_user)

At this point, we say that the instance is pending; no SQL has yet been issued and the object is not yet represented by a row in the database. The Session will issue the SQL to persist the object Ed Jones as soon as is needed, using a process known as a flush.

If we query the database for Ed Jones, all pending information will first be flushed, and the query is issued immediately thereafter. For example, when we query the Ed Jones, an instance is returned which is equivalent to that which we’ve added:

our_user = session.query(User).filter_by(username='ed').first() # the Ed Jones has been added first
our_user is ed_user  # True

In fact, the Session has identified that the row returned is the same row as one already represented within its internal map of objects, so we actually got back the identical instance as that which we just added.

We can add more objects at once:

jim = User(username='Jim', password='jim')
tom = User(username='Tom', password='Tom')
session.add_all([jim, tom])

Updating

We can modify the attributes of instances to update the corresponding rows in database tables:

jim.password = 'jim123'
tom.password = 'tom123'

Deleting

We can delete a instance from session:

session.delete(tom)

Commit

We tell the Session that we’d like to issue all remaining changes to the database and commit the transaction, which has been in progress throughout:

session.commit()

The connection resources referenced by the session are now returned to the connection pool. Subsequent operations with this session will occur in a new transaction, which will again re-acquire connection resources when first needed.

Rollback

Since the Session works within a transaction, we can roll back changes made too.

ed_user.username = 'foo'  # update object
fake_user = User(username='fakeuser', password='12345')
session.add(fake_user) # add object

# Querying the session, we can see that they’re flushed into the current transaction
session.query(User).filter(User.username.in_(['foo', 'fakeuser'])).all()

Rolling back, we can see that ed_user’s name is back to Ed Jones, and fake_user has been kicked out of the session:

session.rollback()

ed_user.username == 'Ed Jones'  # True
fake_user in session  # False

Querying

A Query object is created using the query() method on Session. This function takes a variable number of arguments, which can be any combination of classes and class-instrumented descriptors.

For example:

# The Query accepts mapped classes as arguments
for user in session.query(User).order_by(User.id):
	print(user.username)

# The Query also accepts ORM-instrumented descriptors as arguments
for username, password in session.query(User.username, User.password):
	print(username)

# The tuples returned by Query are named tuples
for row in session.query(User, User.username).all():
    print(row.User, row.username)

# You can control the names of individual column expressions using the label() construct
for row in session.query(User.username.label('name_label')).all():
    print(row.name_label)

# The SQL statements LIMIT and OFFSET via python expression
for u in session.query(User).order_by(User.id)[1:3]:
    print(u.username)

# Filtering results by filter_by() that using keyword arguments
for username in session.query(User.username).filter_by(username='Ed Jones'):
    print(username)
    
# Or filtering by filter(), which uses more flexible SQL expression language constructs
for username in session.query(User.username).filter(User.username=='Ed Jones'):
    print(username)

# The Query object is fully generative, meaning that most method calls return a new Query object upon which further criteria may be added. 
for user in session.query(User).filter(User.username=='Ed Jones').filter('edspassword'):
    print(user.username)

Common filter operators

# equals
query.filter(User.username == 'ed')
# not equals
query.filter(User.username != 'ed')
# LIKE
query.filter(User.username.like('%ed%'))
# case-insensitive LIKE
query.filter(User.username.ilike('%ed%'))
# IN
query.filter(User.username.in_(['ed', 'tom', 'jim']))
# NOT IN
query.filter(~User.username.in_(['ed', 'tom', 'jim']))
# IS NULL
query.filter(User.username.is_(None))
# IS NOT NULL
query.filter(User.username.isnot(None))
# AND
from sqlalchemy import and_
query.filter(and_(User.username == 'ed', User.password='123'))
query.filter(User.username == 'ed', User.password == '123')
query.filter(User.username == 'ed').filter(User.password == '123)
# OR
from sqlalchemy import or_
query.filter(or_(User.username == 'ed', User.password == '123'))
# MATCH
query.filter(User.username.match('wendy'))

Returning Lists and Scalars

A number of methods on Query immediately issue SQL and return a value containing loaded database results:

query = session.query(User)

# all() return a list
query.all()

# first() return the first element of the results list as a scalar
query.first()

# one() raises exceptions when multiple-rows or no-rows found, i.e. it expect exactly one object
query.one()

# one_or_none()
query.one_or_none()

# scalar() invokes the one() method, and upon success returns the first column of the row
query = session.query(User.id).filter_by(username='ed')
the_id = query.scalar()

Counting

Determine how many rows the SQL statement would return:

session.query(User).filter(User.username.like('%ed')).count()

For situations where the “thing to be counted” needs to be indicated specifically, we can specify the “count” function directly:

from sqlalchemy import func
session.query(func.count(User.username), User.username).group_by(User.username).all()

Raw SQL

from sqlalchemy import text

session.query(User).filter(text("id<:value and username=:name")).params(value=224, name='fred').order_by(User.id).one()

Relationship

Build Relational Mapped Class

Let’s consider how a second table, related to User, can be mapped and queried. Users in our system can store any number of email addresses associated with their username. This implies a basic one to many association from the users to a new table which stores email addresses, which we will call addresses. Using declarative, we define this table along with its mapped class, Address:

from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship

class User(Base):
	__tablename__ = 'users'
    
	id = Column(Integer, primary_key=True)
    username = Column(String)
    password = Column(String)
    addresses = relationship("Address", order_by=Address.id,
                            		    back_populates="user")

# addresses -> users : many -> one
class Address(Base):
	__talename__ = 'addresses'
    
    id = Column(Integer, primary_key=True)
    email = Column(String, nullable=False)
    user_id = Column(Integer, ForeignKey('users.id'))
    user = relationship("User", back_populates="addresses")
    

ForeignKey() indicates that values in Address.user_id column should be constrained to be values present in the named remote column users.id.

relationship() uses the foreign key relationships between the two tables to determine the nature of this linkage. In both relationship() directives, the parameter back_populates is assigned to refer to the complementary attribute names; by doing so, each relationship() can make intelligent decision about the same relationship as expressed in reverse; on one side, Address.user refers to a User instance, and on the other side, User.addresses refers to a list of Address instances.

The two complementing relationships Address.user and User.addresses are referred to as a bidirectional relationship, and is a key feature of the SQLAlchemy ORM.

Arguments to relationship() which concern the remote class can be specified using strings, assuming the Declarative system is in use. Once all mappings are complete, these strings are evaluated as Python expressions in order to produce the actual argument, in the above case the User class. The names which are allowed during this evaluation include, among other things, the names of all classes which have been created in terms of the declared base.

Create a new user instance:

jack = User(username='jack', password='123')
print(jack.address)  # an empty list

We are free to add Address objects on our User object:

jack.addresses = [
    Address(email='eo@12.com'),
    Address(email='78@google.com')
]

When using a bidirectional relationship, elements added in one direction automatically become visible in the other direction. This behavior occurs based on attribute on-change events and is evaluated in Python, without using any SQL:

jack.addresses[0].user  # the jack user

Let’s add and commit jack to the database. jack as well as the two Address members in the corresponding addresses collection are both added to the session at once, using a process known as cascading:

session.add(jack)
session.commit()

Querying with Joins

To construct a simple implicit join between User and Address, we can use filter() to equate their related columns together:

for user, address in session.query(User, Address).filter(User.id == Address.user_id).filter(Address.email.like('%google.com%')).all():
	print(user.username, address.email)

Or we can explicit join the two tables by using join():

for user in session.query(User).join(Address).\
		filter(Address.email.like('%google.com%')).all():
	print(user.username)

join() knows how to join between User and Address because there’s only one foreign key between them. If there were no foreign keys, or several, join() works better when one of the following forms are used:

query.join(Address, User.id==Address.user_id)    # explicit condition
query.join(User.addresses)                       # specify relationship from left to right
query.join(Address, User.addresses)              # same, with explicit target
query.join('addresses')                          # same, using a string

Using Aliases

When querying across multiple tables, if the same table needs to be referenced more than once, SQL typically requires that the table be aliased with another name, so that it can be distinguished against other occurrences of that table.

Below we join to the Address entity twice, to locate a user who has two distinct email addresses at the same time:

from sqlalchemy.orm import aliased
adalias1 = aliased(Address)
adalias2 = aliased(Address)
for username, email1, email2 in \
    session.query(User.username, adalias1.email, adalias2.email).\
    join(adalias1, User.addresses).\
    join(adalias2, User.addresses).\
    filter(adalias1.email=='jack@google.com').\
    filter(adalias2.email=='j25@yahoo.com'):
    print(username, email1, email2)

Using Sub-queries

The Query is suitable for generating statements which can be used as subqueries. Suppose we wanted to load User objects along with a count of how many Address records each user has:

from sqlalchemy.sql import func
# sub-query
stmt = session.query(Address.user_id, func.count('*').\
        label('address_count')).\
        group_by(Address.user_id).subquery()

for u, count in session.query(User, stmt.c.address_count).\
		outerjoin(stmt, User.id==stmt.c.user_id).order_by(User.id):
    print(u.username, count)

Above, we just selected a result that included a column from a subquery. What if we wanted our subquery to map to an entity ?

from sqlalchemy.orm import aliased

stmt = session.query(Address).filter(Address.email != 'j25@yahoo.com').subquery()
adalias = aliased(Address, stmt)
for user, address in session.query(User, adalias).join(adalias, User.addresses):
	print(user, address)

Using EXISTS

The EXISTS keyword in SQL is a boolean operator which returns True if the given expression contains any rows. It may be used in many scenarios in place of joins, and is also useful for locating rows which do not have a corresponding row in a related table.

from sqlalchemy.sql import exists

for username in session.query(User.username).\
		filter(exists().where(Address.user_id==User.id)):
	print(username)

Or by relationship operators any(), has(), contains() and so on:

for username in session.query(User.username).filter(User.addresses.any()):
	print(username)

# used for collections EXISTS testing
for username in session.query(User.username).\
		filter(User.addresses.any(Address.email.like('%google%'))):
	print(username)

# used for scalar EXISTS testing
for address in session.query(Address).\
       	filter(~Address.user.has(User.name=='jack')):
    print(address)

# used for one-to-many collections EXISTS testing
for user in session.query(User).\
		filter(User.addresses.contains(Address.email.like('%goolge.com%'))):
    print(user.username)

Eager Loading

Recall earlier that we illustrated a lazy loading operation, when we accessed the User.addresses collection of a User and SQL was emitted. If you want to reduce the number of queries (dramatically, in many cases), we can apply an eager load to the query operation. SQLAlchemy offers three types of eager loading, two of which are automatic, and a third which involves custom criterion.

Selectin Load

selectinload() emits a second SELECT statement that fully loads the collections associated with the results just loaded

from sqlalchemy.orm import selectinload

jack = session.query(User).\
                options(selectinload(User.addresses)).\
                filter_by(username='jack').one()

Joined Load

joinedload() emits a JOIN, by default a LEFT OUTER JOIN, so that the lead object as well as the related object or collection is loaded in one step

from sqlalchemy.orm import joinedload

jack = session.query(User).\
               options(joinedload(User.addresses)).\
               filter_by(username='jack').one()

selectinload() tends to be more appropriate for loading related collections while joinedload() tends to be better suited for many-to-one relationships, due to the fact that only one row is loaded for both the lead and the related object.

Explicit Join

A third style of eager loading is when we are constructing a JOIN explicitly in order to locate the primary rows, and would like to additionally apply the extra table to a related object or collection on the primary object. This feature is most typically useful for pre-loading the many-to-one object on a query that needs to filter on that same object.

from sqlalchemy.orm import contains_eager
jacks_addresses = session.query(Address).\
                         join(Address.user).\
                         filter(User.username=='jack').\
                         options(contains_eager(Address.user)).\
                         all()

Cascading Deleting

Let’s try to delete jack and see how that goes:

session.delete(jack)

How about Jack’s Address objects ? Are those objects deleted too? The answer is No. The user_id column of each address associated with jack was set to NULL, but the rows weren’t deleted. SQLAlchemy doesn’t assume that deletes cascade, you have to tell it to do so.

We will configure cascade options on the User.addresses relationship to change the behavior. The new version of User and Address classes as follow:

from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship

class User(Base):

	__tablename__ = 'users'
	id = Column(Integer, primary_key=True)
    username = Column(String)
    password = Column(String)
    addresses = relationship("Address", 
                             back_populates="user",
                             cascade="all, delete, delete-orphan")


class Address(Base):
	__talename__ = 'addresses'
    
    id = Column(Integer, primary_key=True)
    email = Column(String, nullable=False)
    user_id = Column(Integer, ForeignKey('users.id'))
    user = relationship("User", back_populates="addresses")

Deleting Jack will delete both Jack and the remaining Address associated with the user.

Many to Many

We’ll make our application a blog application, where users can write BlogPost items, which have Keyword items associated with them. For a plain many-to-many, we need to create an un-mapped Table construct to serve as the association table.

from sqlalchemy import Table, Text

# association table: an un-mapped Table object
post_keywords = Table('post_keywords', Base.metadata,
    Column('post_id', ForeignKey('posts.id'), primary_key=True),
    Column('keyword_id', ForeignKey('keywords.id'), primary_key=True)

The BlogPost and Keyword mapped classes are declared:

class User(Base):

	__tablename__ = 'users'
	id = Column(Integer, primary_key=True)
    username = Column(String)
    password = Column(String)
	
    posts = relationship('BlogPost', back_populates='author', lazy="dynamic")

class BlogPost(Base):
	__tablename__ = 'posts'
	
	id = Column(Integer, primary_key=True)
	user_id = Column(Integer, ForeignKey('users.id'))
	title = Column(String(255), nullable=False)
	body = Column(Text)
	
    author = relationship('User', back_populates='posts')
	keywords = relationship('Keyword',
							secondary=post_keywords,
							back_populates='posts')
    
    def __init__(self, title, body, author):
        self.title = title
        self.body = body
        self.author = author

class Keyword(Base):
	__tablename__ = 'keywords'
	
	id = Column(Integer, primary_key=True)
	keyword = Column(String(50), nullable=False, unique=True)
	
    posts = relationship('BlogPost',
						 secondary=post_keywords,
						 back_populates='keywords')

The defining feature of a many-to-many relationship is the secondarykeyword argument which references a Table object representing the association table. This table only contains columns which reference the two sides of the relationship; if it has any other columns, such as its own primary key, or foreign keys to other tables, SQLAlchemy requires a different usage pattern called the “association object”, described at Association Object.

Now we can post a new blog:

# get user
jack = session.query(User).filter_by(username='jack').one()
# new post
post = BlogPost("A new day", "hahahahhhhaha", jack)
# add keywords for the new post
post.keywords.append(Keyword("life"))
post.keywords.append(Keyword("happiness"))
# commit
session.add(post)
# query all posts with keyword "life"
session.query(BlogPost).\
		filter(BlogPost.keywords.any(keyword='life')).\
		all()
# posts belong to user jack
session.query(BlogPost).\
		filter(BlogPost.author == jack).\
		filter(BlogPost.keywords.any(keyword='life')).\
		all()
# or we can use dynamic to query
jack.posts.filter(BlogPost.keywords.any(keyword='life')).all()

The Basic Relationship Patterns

When creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers, orders, items etc… But we also need to have relationships between these tables. For instance, customers make orders, and orders contain items. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need.

We will review the following basic relationship patterns:

  • One to Many
  • Many to One
  • One to One
  • Many to Many
  • Association Object
  • Adjacency List

When selecting data from multiple tables with relationships, we will be using the JOIN query. There are several types of JOIN’s, and we are going to learn about the the following:

  • Cross Joins
  • Natural Joins
  • Inner Joins
  • Left (Outer) Joins
  • Right (Outer) Joins

Relationship Patterns

One to Many

One customer may be associated to many orders.

import datetime

from sqlalchemy import Table, Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

# customer -> order: one -> many
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    orders = relationship('Order')
   	
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    date = Column(String(255))
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)    
    customer_id = Column(Integer, ForeignKey('customers.id'))

Many to One

Many orders might belong to one customer:

import datetime

from sqlalchemy import Table, Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

# order -> customer: many -> one
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    date = Column(String(255))
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)    
    customer_id = Column(Integer, ForeignKey('customers.id'))
    customer = relationship('Customer')
    
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)

Both one-to-many and many-to-one patterns actually add a foreign key to the “many” table to refer the entity in the “one” table.

In addition, one-to-many pattern will add a relationship in the “one” mapped class to associate the “many” entities; many-to-one pattern will add a relationship in the “many” mapped class to associate the “one” entity.

We can also establish a bidirectional relationship for them. For the one-to-many pattern, add a relationship for “many” mapped class to refer the “one” entity:

# customer -> order: one -> many
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    orders = relationship('Order', back_populates="customer")
   	
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    date = Column(String(255))
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)    
    customer_id = Column(Integer, ForeignKey('customers.id'))
    customer = relationship('Customer', back_populates="orders")

For the many-to-one, add a relationship for “one” mapped class to refer the “many” entities:

# order -> customer: many -> one
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    date = Column(String(255))
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)    
    customer_id = Column(Integer, ForeignKey('customers.id'))
    customer = relationship('Customer', back_populates="orders")
    
class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    orders = relationship('Order', back_populates="customer")

After establish a bidirectional relationship, you can find the many-to-one pattern is same with one-to-many.

How can we implement cascade deleting? Cascade deleting means if we delete a customer, then all of orders associated this customer will be deleted too. Because SQLAlchemy doesn’t assume that deletes cascade, you have to tell it to do so:

class Customer(Base):
    __tablename__ = 'customers'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    # cascade delete
    orders = relationship('Order', cascade="all, delete, delete-orphan")
   	
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    date = Column(String(255))
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)    
    customer_id = Column(Integer, ForeignKey('customers.id'))

One to One

One-to-one is essentially a bidirectional relationship with a scalar attribute on both sides. To achieve this, the uselist flag indicates the placement of a scalar attribute instead of a collection on the “many” side of the relationship.

One user has only one ID card:

class User(Base):
    __tablename__ = 'users'
    
    id = Column(Integer, primary_key=True)
    username = Column(String)
    idcard_id = Column(Integer, ForeignKey('idcards.id'))
    idcard = relationship('IDCard', back_populates="user")


class IDCard(Base):
	__tablename__ = 'idcards'
    
    id = Column(Integer, primary_key=True)
    code = Column(String(20), nullable=False)
    user = relationship("User", back_populates="idcard", uselist=False)

Many to Many

Many to Many adds an association table between two classes. The association table is indicated by the secondary argument to relationship(). Usually, the Table uses the MetaData object associated with the declarative base class, so that the ForeignKey directives can locate the remote tables with which to link. Note: the association table isn’t a mapped class.

One blog post may have several keywords, and one keyword may be assigned to different blog posts:

post_keywords_table = Table('post_keywords', Base.metadata,
    Column('post_id', Integer, ForeignKey('posts.id')),
    Column('keyword_id', Integer, ForeignKey('keywords.id'))
)

class Post(Base):
    __tablename__ = 'posts'
    
    id = Column(Integer, primary_key=True)
    title = Column(String(255))
    body = Column(Text)
    
    keywords = relationship('Keyword', secondary=post_keywords_table)
    
class Keyword(Base):
    __tablename__ = 'keywords'
    
    id = Column(Integer, primary_key=True)
    term = Column(String(255))

For a bidirectional relationship, both sides of the relationship contain a collection:

class Post(Base):
    __tablename__ = 'posts'
    
    id = Column(Integer, primary_key=True)
    title = Column(String(255))
    body = Column(Text)
    
    keywords = relationship('Keyword', 
                            secondary=post_keywords_table,
                            back_populates='posts')
    
class Keyword(Base):
    __tablename__ = 'keywords'
    
    id = Column(Integer, primary_key=True)
    term = Column(String(255))
    
    posts = relationship('Post',
                         secondary=post_keywords_table,
                         back_populates='keywords')

How about cascade deleting for the association table? A behavior which is unique to the secondary argument to relationship() is that the association Table which is specified here is automatically subject to INSERT and DELETE statements, as objects are added or removed from the collection. There is no need to delete from this table manually. For example:

# the row (post_id, keyword_id) will be deleted from the "secondary" table automatically
one_post.keywords.remove(one_keyword)

Association Object

The association object pattern is a variant on many-to-many: it’s used when your association table contains additional columns beyond those which are foreign keys to the left and right tables. Instead of using the secondary argument, you map a new class directly to the association table. The left side of the relationship references the association object via one-to-many, and the association class references the right side via many-to-one. Note: the association class is a mapped class.

One order may contain many items, and one items may be added into several orders, meanwhile each record in the order may include the number of items:

# Order -> OrderItemsAssociation: one -> many
class Order(Base):
    __tablename__ = 'orders'
    
    id = Column(Integer, primary_key=True)
    created_date = Column(DateTime, default=datetime.datetime.utcnow)
    amount = Column(Integer)
    
    items = relationship('Item', back_populates='order')

# OrderItemsAssociation -> Item: many -> one
class Item(Base):
    __tablename__ = 'items'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255), nullable=False)
    price = Column(Integer, nullable=False)
    
    orders = relationship('OrderItemsAssociation', back_populates='item')
    
class OrderItemsAssociation(Base):
    __tablename__ = 'order_items'
    
    order_id = Column(Integer, ForeignKey('orders.id'))
    item_id = Column(Integer, ForeignKey('items.id'))
    count = Column(Integer, nullable=False)  # the number of item
    
    order = relationship('Order', back_populates='items')
    item = relationship('Item', back_populates='orders')

Create an order and add some items:

apple = Item(name='apple', price=20)
pear = Item(name='pear', price=12)

order = Order()

order_apple = OrderItemsAssociation(count=2) # two apple
order_apple.item = apple
order.items.append(order_apple)

order_pear = OrderItemsAssociation(count=3) # three pear
order_pear.item = pear
order.items.append(order_pear)

Self-Referential/Adjacency List

The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself. This is the most common way to represent hierarchical data in flat tables. Other methods include nested sets, sometimes called “modified preorder”, as well as materialized path.

Tree structure

Categories are organized as tree structure, i.e. parent category has many children categories and a child category belongs to a parent category:

class Category(Base):
	__tablename__ = 'categories'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    parent_id = Column(Integer, ForeignKey('categories.id'))
    children = relationship('Category')  # one-to-many
    

The above relationship is one-to-many. To establish the relationship as many-to-one, an extra directive is added known as remote_side:

class Category(Base):
	__tablename__ = 'categories'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    parent_id = Column(Integer, ForeignKey('categories.id'))
    parent = relationship('Category', remote_side=[id])  # many-to-one

As always, both directions can be combined into a bidirectional relationship using the backref() function:

class Category(Base):
	__tablename__ = 'categories'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    parent_id = Column(Integer, ForeignKey('categories.id'))
    # one-to-many and many-to-one
    children = relationship('Category',
                            backref=backref('parent', remote_side=[id]))

Composite primary key

A sub-category of the adjacency list relationship is the rare case where a particular column is present on both the “local” and “remote” side of the join condition, like the following account_id column.

We know a file system organizes the folders as a tree. If our system is for multi-user share, an unique folder is determined by folder_id and account_id, and there must be existed an tree folders structure for each user.

from sqlalchemy.schema import ForeignKeyConstraint

class Folder(Base):
    __tablename__ = 'folder'
    # FOREIGN KEY(account_id, parent_id) REFERENCES folder (account_id, folder_id)
    __table_args__ = (
      ForeignKeyConstraint(
          ['account_id', 'parent_id'],
          ['folder.account_id', 'folder.folder_id']),
    )

    account_id = Column(Integer, primary_key=True)
    folder_id = Column(Integer, primary_key=True)
    parent_id = Column(Integer)
    name = Column(String)

    parent = relationship("Folder",
                        backref="children",
                        remote_side=[account_id, folder_id]
                  )

The account_id column references itself and parent_id references the folder_id column. As said before, the account_id is present on both the “local” and “remote” side of the join condition, to ensure the folders belong to the same user. Thus the primary key folder_id is used to construct the tree structure and the another account_id is used to track user authorship.

Querying

Query the “fruit/apple” category

from sqlalchemy.orm import aliased

ParentCategory = aliased(Category)
session.query(Category).\
		filter(Category.name == 'apple').\
		join(ParentCategory, Category.parent).\
		filter(ParentCategory.name == 'fruit').\
        all()

Or do the same thing

session.query(Category).\
		filter(Category.name == 'apple').\
		join(Category.parent, aliased=True).\
		filter(Category.name == 'fruit').\
        all()

Query the “food/fruit/apple”

session.query(Category).\
		filter(Category.name == 'apple').\
		join(Category.parent, aliased=True).\
		filter(Category.name == 'fruit').\
        join(Category.parent, aliased=True, from_joinpoint=True).\
        filter(Category.name == 'food').\
        all()

Eager loading

Eager loading of relationships occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its immediate child collection or reference can be populated from a single SQL statement, or a second statement for all immediate child collections. If you want to use eager loading with a self-referential relationship, SQLAlchemy needs to be told how many levels deep (join_depth) it should join and/or query; otherwise the eager load will not take place at all.

class Category(Base):
	__tablename__ = 'categories'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    parent_id = Column(Integer, ForeignKey('categories.id'))
    # one-to-many and many-to-one
    children = relationship('Category',
                            lazy="joined",
                    		join_depth=2)

Foreign Key

The relationships are build upon the SQL’s foreign key constraints mechanism.

A foreign key in SQL is a table-level construct that constrains one or more columns in that table to only allow values that are present in a different set of columns, typically but not always located on a different table. We call the columns which are constrained the foreign key columns and the columns which they are constrained towards the referenced columns. The referenced columns almost always define the primary key for their owning table, though there are exceptions to this. The foreign key is the “joint” that connects together pairs of rows which have a relationship with each other, and SQLAlchemy assigns very deep importance to this concept in virtually every area of its operation. There are two ways to define foreign key constraints:

Single-column foreign keys

The single column foreign key is common, and at the column level is specified by constructing a ForeignKey object as an argument to a Column object:

class User(Base):
    __tablename__ = 'users'
    
    id = Column(Integer, primary_key=True)
    name = Column(String(255))
    
class Address(name):
    __tablename__ = 'addresses'
    
    id = Column(Integer, primary_key=True)
    data = Column(Text)
    # foreign key
    user_id = Column(Integer, ForeignKey('users.id'))

Table-clause foreign keys

Foreign keys may also be defined at the table level, using the ForeignKeyConstraint object. This object can describe a single- or multi-column foreign key. A multi-column foreign key is known as a composite foreign key, and almost always references a table that has a composite primary key.

from sqlalchemy.schema import ForeignKeyConstraint

class One(Base):
    __tablename__ = 'one'
    foo_id = Column(Integer, primary_key=True)
    bar_id = Column(Integer, primary_key=True)
    data = Column(Text)
    
class Many(Base):
    __tablename__ = 'many'
    __table_args__ = (
      ForeignKeyConstraint(
          ['foo_id', 'bar_id'],
          ['folder.foo_id', 'folder.bar_id']),
    )
    id = Column(Integer, primary_key=True)
    foo_id = Column(Integer, nullable=False)
    bar_id = Column(Integer, nullable=False)
    data = Column(Text)    

Note: the ForeignKeyConstraint is the only way to define a composite foreign key.

Cascading update and delete

Most databases support cascading of foreign key values, that is the when a parent row is updated the new value is placed in child rows, or when the parent row is deleted all corresponding child rows are set to null or deleted. In data definition language these are specified using phrases like “ON UPDATE CASCADE”, “ON DELETE CASCADE”, and “ON DELETE SET NULL”, corresponding to foreign key constraints.

The ForeignKey andForeignKeyConstraint objects support the generation of this clause via the onupdate and ondelete keyword arguments.

child = Table('child', meta,
    Column('id', Integer,
            ForeignKey('parent.id', onupdate="CASCADE", ondelete="CASCADE"),
            primary_key=True
    )
)

composite = Table('composite', meta,
    Column('id', Integer, primary_key=True),
    Column('rev_id', Integer),
    Column('note_id', Integer),
    ForeignKeyConstraint(
                ['rev_id', 'note_id'],
                ['revisions.id', 'revisions.note_id'],
                onupdate="CASCADE", ondelete="SET NULL"
    )
)

Note that these clauses require InnoDB tables when used with MySQL. They may also not be supported on other databases.

JOIN Tables

To retrieve data from a database that has relationships, we often need to use JOIN queries.

Xiao Wenbin
Xiao Wenbin
Natural Language Processing Engineer

My research interests include machine learning, information retrieval and natural language processing.

Related