A tutorial for SQLAlchemy
[TOC]
ORM Components
-
What is the Object Relational Mapper (ORM) in SQLAlchemy?
The SQLAlchemy Object Relational Mapper presents a method of associating user-defined Python classes with database tables, and instances of those classes (objects) with rows in their corresponding tables. It includes a system that transparently synchronizes all changes in state between objects and their related rows, called a unit of work, as well as a system for expressing database queries in terms of the user defined classes and their defined relationships between each other.
-
The Object Relational Mapper and Expression Language
The ORM is in contrast to the SQLAlchemy Expression Language, upon which the ORM is constructed. Whereas the SQL Expression Language presents a system of representing the primitive constructs of the relational database directly without opinion, the ORM presents a high level and abstracted pattern of usage, which itself is an example of applied usage of the Expression Language.
While there is overlap among the usage patterns of the ORM and the Expression Language, the similarities are more superficial than they may at first appear. One approaches the structure and content of data from the perspective of a user-defined domain model which is transparently persisted and refreshed from its underlying storage model. The other approaches it from the perspective of literal schema and SQL expression representations which are explicitly composed into messages consumed individually by the database.
A successful application may be constructed using the Object Relational Mapper exclusively. In advanced situations, an application constructed with the ORM may make occasional usage of the Expression Language directly in certain areas where specific database interactions are required.
Engine
An instance of Engine represents the core interface to the database, adapted through a dialect that handles the details of the database and DBAPI in use. When using the ORM, we typically don’t use the Engine directly once created; instead, it’s used behind the scenes by the ORM.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:', echo=True)
The first time a method like Engine.execute()
or Engine.connect()
is called, the Engine establishes a real DBAPI connection to the database, which is then used to emit the SQL.
Session Factory and Object
We’re now ready to start talking to the database. The ORM’s “handle” to the database is the Session
. When we first set up the application, at the same level as our create_engine()
statement, we define a Session
class which will serve as a factory for new Session
objects:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine) # a factory class for Session
Then, whenever you need to have a conversation with the database, you instantiate a Session
:
session = Session()
The above Session
is associated with our SQLite-enabled engine, but it hasn’t opened any connections yet. When it’s first used, it retrieves a connection from a pool of connections maintained by the Engine
, and holds onto it until we commit all changes and/or close the session object.
Declarative Base
When using ORM frameworks, the configurational process starts by describing the database tables we’ll be dealing with, and then by defining user-defined classes which will be mapped to those tables.
In modern SQLAlchemy, these two tasks are usually performed together, using a system known as Declarative, which allows us to create classes that include directives to describe the actual database table they will be mapped to.
Classes mapped using the Declarative system are defined in terms of a base class which maintains a catalog of classes and tables relative to that base - this is known as the declarative base class. Our application will usually have just one instance of this base in a commonly imported module.
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
Mapped Class
We can define any number of mapped classes inherit from declarative base class. For example, if we want interactive with database table users
, we can define a mapped class as follows:
from sqlalchemy import Column, Integer, String
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
username = Column(String)
password = Column(String)
Within the mapped class, we define details about the database table to which we’ll be mapping, primarily the table name, and names and data types of columns, known as table metadata.
A class using Declarative at a minimum needs a __tablename__
attribute, and at least one Column
which is part of a primary key.
When we declared our class, Declarative used a Python metaclass in order to perform additional activities once the class declaration was complete; Declarative replaces all the Column
objects with special Python accessors known as descriptors; this is a process known as instrumentation. The “instrumented” mapped class will provide us with the means to refer to our table in a SQL context as well as to persist and load the values of columns from the database. You can check via type(User.username)
.
Instances of Mapped Class
An instance of mapped class can be used to represent a row in their corresponding database table. And there is a system that transparently synchronizes all changes in state between objects and their related rows, as well as a system for expressing database queries in terms of the user defined classes and their defined relationships between each other. All of those operations are running behind-scenes, we don’t care about them.
Table and Mapper Object
With our User
class defined via the Declarative system, we have defined information about our table, known as table metadata. The object used by SQLAlchemy to represent this information for a specific table is called the Table
object, and here Declarative has made one for us. We can see this object by User.__table__
.
This Table
object is made by Declarative via a Python metaclass to perform additional activities once the class declaration was complete. Within this phase, it then created a Table
object according to our specifications, and associated it with the class by constructing a Mapper
object. The Mapper
object defines the correlation of mapped class attributes to database table columns. This object is a behind-the-scenes object we normally don’t need to deal with directly. We can see this object by User.__mapper__
.
MetaData Object
MetaData
object is a collection of Table
objects and their associated schema constructs. When using Declarative, this object is available using the .metadata
attribute of our declarative base class, i.e. Base.metadata
.
MetaData
object holds a collection of Table
objects as well as an optional binding to an Engine
or Connection
. If bound, the Table
objects in the collection and their columns may participate in implicit SQL execution. And the Table
objects themselves are stored in the MetaData.tables
dictionary. When using Declarative, the mapped class’s Table
is automatically added into the Base.metadata
collection. Thus the object User.__table__
is a member of the Base.metadata
object.
The Base.metabase
is a registry which includes the ability to emit a limited set of schema generation commands to the database. As our SQLite database does not actually have a users
table present, we can use it to issue CREATE TABLE statements to the database for all tables that don’t yet exist:
Base.metadata.create_all(engine)
CRUD
Adding, Updating and Deleting Objects
Adding
Add objects to session:
ed_user = User(username='Ed Jones', password='edspassword')
session.add(ed_user)
At this point, we say that the instance is pending; no SQL has yet been issued and the object is not yet represented by a row in the database. The Session will issue the SQL to persist the object Ed Jones
as soon as is needed, using a process known as a flush.
If we query the database for Ed Jones
, all pending information will first be flushed, and the query is issued immediately thereafter. For example, when we query the Ed Jones
, an instance is returned which is equivalent to that which we’ve added:
our_user = session.query(User).filter_by(username='ed').first() # the Ed Jones has been added first
our_user is ed_user # True
In fact, the Session has identified that the row returned is the same row as one already represented within its internal map of objects, so we actually got back the identical instance as that which we just added.
We can add more objects at once:
jim = User(username='Jim', password='jim')
tom = User(username='Tom', password='Tom')
session.add_all([jim, tom])
Updating
We can modify the attributes of instances to update the corresponding rows in database tables:
jim.password = 'jim123'
tom.password = 'tom123'
Deleting
We can delete a instance from session:
session.delete(tom)
Commit
We tell the Session that we’d like to issue all remaining changes to the database and commit the transaction, which has been in progress throughout:
session.commit()
The connection resources referenced by the session are now returned to the connection pool. Subsequent operations with this session will occur in a new transaction, which will again re-acquire connection resources when first needed.
Rollback
Since the Session works within a transaction, we can roll back changes made too.
ed_user.username = 'foo' # update object
fake_user = User(username='fakeuser', password='12345')
session.add(fake_user) # add object
# Querying the session, we can see that they’re flushed into the current transaction
session.query(User).filter(User.username.in_(['foo', 'fakeuser'])).all()
Rolling back, we can see that ed_user
’s name is back to Ed Jones
, and fake_user
has been kicked out of the session:
session.rollback()
ed_user.username == 'Ed Jones' # True
fake_user in session # False
Querying
A Query
object is created using the query()
method on Session
. This function takes a variable number of arguments, which can be any combination of classes and class-instrumented descriptors.
For example:
# The Query accepts mapped classes as arguments
for user in session.query(User).order_by(User.id):
print(user.username)
# The Query also accepts ORM-instrumented descriptors as arguments
for username, password in session.query(User.username, User.password):
print(username)
# The tuples returned by Query are named tuples
for row in session.query(User, User.username).all():
print(row.User, row.username)
# You can control the names of individual column expressions using the label() construct
for row in session.query(User.username.label('name_label')).all():
print(row.name_label)
# The SQL statements LIMIT and OFFSET via python expression
for u in session.query(User).order_by(User.id)[1:3]:
print(u.username)
# Filtering results by filter_by() that using keyword arguments
for username in session.query(User.username).filter_by(username='Ed Jones'):
print(username)
# Or filtering by filter(), which uses more flexible SQL expression language constructs
for username in session.query(User.username).filter(User.username=='Ed Jones'):
print(username)
# The Query object is fully generative, meaning that most method calls return a new Query object upon which further criteria may be added.
for user in session.query(User).filter(User.username=='Ed Jones').filter('edspassword'):
print(user.username)
Common filter operators
# equals
query.filter(User.username == 'ed')
# not equals
query.filter(User.username != 'ed')
# LIKE
query.filter(User.username.like('%ed%'))
# case-insensitive LIKE
query.filter(User.username.ilike('%ed%'))
# IN
query.filter(User.username.in_(['ed', 'tom', 'jim']))
# NOT IN
query.filter(~User.username.in_(['ed', 'tom', 'jim']))
# IS NULL
query.filter(User.username.is_(None))
# IS NOT NULL
query.filter(User.username.isnot(None))
# AND
from sqlalchemy import and_
query.filter(and_(User.username == 'ed', User.password='123'))
query.filter(User.username == 'ed', User.password == '123')
query.filter(User.username == 'ed').filter(User.password == '123)
# OR
from sqlalchemy import or_
query.filter(or_(User.username == 'ed', User.password == '123'))
# MATCH
query.filter(User.username.match('wendy'))
Returning Lists and Scalars
A number of methods on Query immediately issue SQL and return a value containing loaded database results:
query = session.query(User)
# all() return a list
query.all()
# first() return the first element of the results list as a scalar
query.first()
# one() raises exceptions when multiple-rows or no-rows found, i.e. it expect exactly one object
query.one()
# one_or_none()
query.one_or_none()
# scalar() invokes the one() method, and upon success returns the first column of the row
query = session.query(User.id).filter_by(username='ed')
the_id = query.scalar()
Counting
Determine how many rows the SQL statement would return:
session.query(User).filter(User.username.like('%ed')).count()
For situations where the “thing to be counted” needs to be indicated specifically, we can specify the “count” function directly:
from sqlalchemy import func
session.query(func.count(User.username), User.username).group_by(User.username).all()
Raw SQL
from sqlalchemy import text
session.query(User).filter(text("id<:value and username=:name")).params(value=224, name='fred').order_by(User.id).one()
Relationship
Build Relational Mapped Class
Let’s consider how a second table, related to User
, can be mapped and queried. Users in our system can store any number of email addresses associated with their username. This implies a basic one to many association from the users
to a new table which stores email addresses, which we will call addresses
. Using declarative, we define this table along with its mapped class, Address
:
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
username = Column(String)
password = Column(String)
addresses = relationship("Address", order_by=Address.id,
back_populates="user")
# addresses -> users : many -> one
class Address(Base):
__talename__ = 'addresses'
id = Column(Integer, primary_key=True)
email = Column(String, nullable=False)
user_id = Column(Integer, ForeignKey('users.id'))
user = relationship("User", back_populates="addresses")
ForeignKey()
indicates that values in Address.user_id
column should be constrained to be values present in the named remote column users.id
.
relationship()
uses the foreign key relationships between the two tables to determine the nature of this linkage. In both relationship()
directives, the parameter back_populates
is assigned to refer to the complementary attribute names; by doing so, each relationship()
can make intelligent decision about the same relationship as expressed in reverse; on one side, Address.user
refers to a User
instance, and on the other side, User.addresses
refers to a list of Address
instances.
The two complementing relationships Address.user
and User.addresses
are referred to as a bidirectional relationship, and is a key feature of the SQLAlchemy ORM.
Arguments to relationship()
which concern the remote class can be specified using strings, assuming the Declarative system is in use. Once all mappings are complete, these strings are evaluated as Python expressions in order to produce the actual argument, in the above case the User
class. The names which are allowed during this evaluation include, among other things, the names of all classes which have been created in terms of the declared base.
Working with Related Objects
Create a new user instance:
jack = User(username='jack', password='123')
print(jack.address) # an empty list
We are free to add Address
objects on our User
object:
jack.addresses = [
Address(email='eo@12.com'),
Address(email='78@google.com')
]
When using a bidirectional relationship, elements added in one direction automatically become visible in the other direction. This behavior occurs based on attribute on-change events and is evaluated in Python, without using any SQL:
jack.addresses[0].user # the jack user
Let’s add and commit jack
to the database. jack
as well as the two Address
members in the corresponding addresses
collection are both added to the session at once, using a process known as cascading:
session.add(jack)
session.commit()
Querying with Joins
To construct a simple implicit join between User
and Address
, we can use filter()
to equate their related columns together:
for user, address in session.query(User, Address).filter(User.id == Address.user_id).filter(Address.email.like('%google.com%')).all():
print(user.username, address.email)
Or we can explicit join the two tables by using join()
:
for user in session.query(User).join(Address).\
filter(Address.email.like('%google.com%')).all():
print(user.username)
join()
knows how to join between User
and Address
because there’s only one foreign key between them. If there were no foreign keys, or several, join()
works better when one of the following forms are used:
query.join(Address, User.id==Address.user_id) # explicit condition
query.join(User.addresses) # specify relationship from left to right
query.join(Address, User.addresses) # same, with explicit target
query.join('addresses') # same, using a string
Using Aliases
When querying across multiple tables, if the same table needs to be referenced more than once, SQL typically requires that the table be aliased with another name, so that it can be distinguished against other occurrences of that table.
Below we join to the Address
entity twice, to locate a user who has two distinct email addresses at the same time:
from sqlalchemy.orm import aliased
adalias1 = aliased(Address)
adalias2 = aliased(Address)
for username, email1, email2 in \
session.query(User.username, adalias1.email, adalias2.email).\
join(adalias1, User.addresses).\
join(adalias2, User.addresses).\
filter(adalias1.email=='jack@google.com').\
filter(adalias2.email=='j25@yahoo.com'):
print(username, email1, email2)
Using Sub-queries
The Query
is suitable for generating statements which can be used as subqueries. Suppose we wanted to load User objects along with a count of how many Address records each user has:
from sqlalchemy.sql import func
# sub-query
stmt = session.query(Address.user_id, func.count('*').\
label('address_count')).\
group_by(Address.user_id).subquery()
for u, count in session.query(User, stmt.c.address_count).\
outerjoin(stmt, User.id==stmt.c.user_id).order_by(User.id):
print(u.username, count)
Above, we just selected a result that included a column from a subquery. What if we wanted our subquery to map to an entity ?
from sqlalchemy.orm import aliased
stmt = session.query(Address).filter(Address.email != 'j25@yahoo.com').subquery()
adalias = aliased(Address, stmt)
for user, address in session.query(User, adalias).join(adalias, User.addresses):
print(user, address)
Using EXISTS
The EXISTS keyword in SQL is a boolean operator which returns True if the given expression contains any rows. It may be used in many scenarios in place of joins, and is also useful for locating rows which do not have a corresponding row in a related table.
from sqlalchemy.sql import exists
for username in session.query(User.username).\
filter(exists().where(Address.user_id==User.id)):
print(username)
Or by relationship
operators any()
, has()
, contains()
and so on:
for username in session.query(User.username).filter(User.addresses.any()):
print(username)
# used for collections EXISTS testing
for username in session.query(User.username).\
filter(User.addresses.any(Address.email.like('%google%'))):
print(username)
# used for scalar EXISTS testing
for address in session.query(Address).\
filter(~Address.user.has(User.name=='jack')):
print(address)
# used for one-to-many collections EXISTS testing
for user in session.query(User).\
filter(User.addresses.contains(Address.email.like('%goolge.com%'))):
print(user.username)
Eager Loading
Recall earlier that we illustrated a lazy loading operation, when we accessed the User.addresses
collection of a User
and SQL was emitted. If you want to reduce the number of queries (dramatically, in many cases), we can apply an eager load to the query operation. SQLAlchemy offers three types of eager loading, two of which are automatic, and a third which involves custom criterion.
Selectin Load
selectinload()
emits a second SELECT statement that fully loads the collections associated with the results just loaded
from sqlalchemy.orm import selectinload
jack = session.query(User).\
options(selectinload(User.addresses)).\
filter_by(username='jack').one()
Joined Load
joinedload()
emits a JOIN, by default a LEFT OUTER JOIN, so that the lead object as well as the related object or collection is loaded in one step
from sqlalchemy.orm import joinedload
jack = session.query(User).\
options(joinedload(User.addresses)).\
filter_by(username='jack').one()
selectinload()
tends to be more appropriate for loading related collections while joinedload()
tends to be better suited for many-to-one relationships, due to the fact that only one row is loaded for both the lead and the related object.
Explicit Join
A third style of eager loading is when we are constructing a JOIN explicitly in order to locate the primary rows, and would like to additionally apply the extra table to a related object or collection on the primary object. This feature is most typically useful for pre-loading the many-to-one object on a query that needs to filter on that same object.
from sqlalchemy.orm import contains_eager
jacks_addresses = session.query(Address).\
join(Address.user).\
filter(User.username=='jack').\
options(contains_eager(Address.user)).\
all()
Cascading Deleting
Let’s try to delete jack
and see how that goes:
session.delete(jack)
How about Jack’s Address
objects ? Are those objects deleted too? The answer is No. The user_id
column of each address associated with jack
was set to NULL, but the rows weren’t deleted. SQLAlchemy doesn’t assume that deletes cascade, you have to tell it to do so.
We will configure cascade options on the User.addresses
relationship to change the behavior. The new version of User
and Address
classes as follow:
from sqlalchemy import ForeignKey
from sqlalchemy.orm import relationship
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
username = Column(String)
password = Column(String)
addresses = relationship("Address",
back_populates="user",
cascade="all, delete, delete-orphan")
class Address(Base):
__talename__ = 'addresses'
id = Column(Integer, primary_key=True)
email = Column(String, nullable=False)
user_id = Column(Integer, ForeignKey('users.id'))
user = relationship("User", back_populates="addresses")
Deleting Jack
will delete both Jack and the remaining Address
associated with the user.
Many to Many
We’ll make our application a blog application, where users can write BlogPost
items, which have Keyword
items associated with them. For a plain many-to-many, we need to create an un-mapped Table
construct to serve as the association table.
from sqlalchemy import Table, Text
# association table: an un-mapped Table object
post_keywords = Table('post_keywords', Base.metadata,
Column('post_id', ForeignKey('posts.id'), primary_key=True),
Column('keyword_id', ForeignKey('keywords.id'), primary_key=True)
The BlogPost
and Keyword
mapped classes are declared:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
username = Column(String)
password = Column(String)
posts = relationship('BlogPost', back_populates='author', lazy="dynamic")
class BlogPost(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id'))
title = Column(String(255), nullable=False)
body = Column(Text)
author = relationship('User', back_populates='posts')
keywords = relationship('Keyword',
secondary=post_keywords,
back_populates='posts')
def __init__(self, title, body, author):
self.title = title
self.body = body
self.author = author
class Keyword(Base):
__tablename__ = 'keywords'
id = Column(Integer, primary_key=True)
keyword = Column(String(50), nullable=False, unique=True)
posts = relationship('BlogPost',
secondary=post_keywords,
back_populates='keywords')
The defining feature of a many-to-many relationship is the secondary
keyword argument which references a Table
object representing the association table. This table only contains columns which reference the two sides of the relationship; if it has any other columns, such as its own primary key, or foreign keys to other tables, SQLAlchemy requires a different usage pattern called the “association object”, described at Association Object.
Now we can post a new blog:
# get user
jack = session.query(User).filter_by(username='jack').one()
# new post
post = BlogPost("A new day", "hahahahhhhaha", jack)
# add keywords for the new post
post.keywords.append(Keyword("life"))
post.keywords.append(Keyword("happiness"))
# commit
session.add(post)
# query all posts with keyword "life"
session.query(BlogPost).\
filter(BlogPost.keywords.any(keyword='life')).\
all()
# posts belong to user jack
session.query(BlogPost).\
filter(BlogPost.author == jack).\
filter(BlogPost.keywords.any(keyword='life')).\
all()
# or we can use dynamic to query
jack.posts.filter(BlogPost.keywords.any(keyword='life')).all()
The Basic Relationship Patterns
When creating a database, common sense dictates that we use separate tables for different types of entities. Some examples are: customers
, orders
, items
etc… But we also need to have relationships between these tables. For instance, customers
make orders
, and orders
contain items
. These relationships need to be represented in the database. Also, when fetching data with SQL, we need to use certain types of JOIN queries to get what we need.
We will review the following basic relationship patterns:
- One to Many
- Many to One
- One to One
- Many to Many
- Association Object
- Adjacency List
When selecting data from multiple tables with relationships, we will be using the JOIN query. There are several types of JOIN’s, and we are going to learn about the the following:
- Cross Joins
- Natural Joins
- Inner Joins
- Left (Outer) Joins
- Right (Outer) Joins
Relationship Patterns
One to Many
One customer may be associated to many orders.
import datetime
from sqlalchemy import Table, Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
# customer -> order: one -> many
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
orders = relationship('Order')
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
date = Column(String(255))
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
customer_id = Column(Integer, ForeignKey('customers.id'))
Many to One
Many orders might belong to one customer:
import datetime
from sqlalchemy import Table, Column, Integer, String, DateTime, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
# order -> customer: many -> one
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
date = Column(String(255))
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
customer_id = Column(Integer, ForeignKey('customers.id'))
customer = relationship('Customer')
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
Both one-to-many and many-to-one patterns actually add a foreign key to the “many” table to refer the entity in the “one” table.
In addition, one-to-many pattern will add a relationship
in the “one” mapped class to associate the “many” entities; many-to-one pattern will add a relationship
in the “many” mapped class to associate the “one” entity.
We can also establish a bidirectional relationship for them. For the one-to-many pattern, add a relationship
for “many” mapped class to refer the “one” entity:
# customer -> order: one -> many
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
orders = relationship('Order', back_populates="customer")
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
date = Column(String(255))
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
customer_id = Column(Integer, ForeignKey('customers.id'))
customer = relationship('Customer', back_populates="orders")
For the many-to-one, add a relationship
for “one” mapped class to refer the “many” entities:
# order -> customer: many -> one
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
date = Column(String(255))
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
customer_id = Column(Integer, ForeignKey('customers.id'))
customer = relationship('Customer', back_populates="orders")
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
orders = relationship('Order', back_populates="customer")
After establish a bidirectional relationship, you can find the many-to-one pattern is same with one-to-many.
How can we implement cascade deleting? Cascade deleting means if we delete a customer, then all of orders associated this customer will be deleted too. Because SQLAlchemy doesn’t assume that deletes cascade, you have to tell it to do so:
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
# cascade delete
orders = relationship('Order', cascade="all, delete, delete-orphan")
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
date = Column(String(255))
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
customer_id = Column(Integer, ForeignKey('customers.id'))
One to One
One-to-one is essentially a bidirectional relationship with a scalar attribute on both sides. To achieve this, the uselist
flag indicates the placement of a scalar attribute instead of a collection on the “many” side of the relationship.
One user has only one ID card:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
username = Column(String)
idcard_id = Column(Integer, ForeignKey('idcards.id'))
idcard = relationship('IDCard', back_populates="user")
class IDCard(Base):
__tablename__ = 'idcards'
id = Column(Integer, primary_key=True)
code = Column(String(20), nullable=False)
user = relationship("User", back_populates="idcard", uselist=False)
Many to Many
Many to Many adds an association table between two classes. The association table is indicated by the secondary argument to relationship()
. Usually, the Table uses the MetaData
object associated with the declarative base class, so that the ForeignKey
directives can locate the remote tables with which to link. Note: the association table isn’t a mapped class.
One blog post may have several keywords, and one keyword may be assigned to different blog posts:
post_keywords_table = Table('post_keywords', Base.metadata,
Column('post_id', Integer, ForeignKey('posts.id')),
Column('keyword_id', Integer, ForeignKey('keywords.id'))
)
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
title = Column(String(255))
body = Column(Text)
keywords = relationship('Keyword', secondary=post_keywords_table)
class Keyword(Base):
__tablename__ = 'keywords'
id = Column(Integer, primary_key=True)
term = Column(String(255))
For a bidirectional relationship, both sides of the relationship contain a collection:
class Post(Base):
__tablename__ = 'posts'
id = Column(Integer, primary_key=True)
title = Column(String(255))
body = Column(Text)
keywords = relationship('Keyword',
secondary=post_keywords_table,
back_populates='posts')
class Keyword(Base):
__tablename__ = 'keywords'
id = Column(Integer, primary_key=True)
term = Column(String(255))
posts = relationship('Post',
secondary=post_keywords_table,
back_populates='keywords')
How about cascade deleting for the association table? A behavior which is unique to the secondary
argument to relationship()
is that the association Table
which is specified here is automatically subject to INSERT and DELETE statements, as objects are added or removed from the collection. There is no need to delete from this table manually. For example:
# the row (post_id, keyword_id) will be deleted from the "secondary" table automatically
one_post.keywords.remove(one_keyword)
Association Object
The association object pattern is a variant on many-to-many: it’s used when your association table contains additional columns beyond those which are foreign keys to the left and right tables. Instead of using the secondary
argument, you map a new class directly to the association table. The left side of the relationship references the association object via one-to-many, and the association class references the right side via many-to-one. Note: the association class is a mapped class.
One order may contain many items, and one items may be added into several orders, meanwhile each record in the order may include the number of items:
# Order -> OrderItemsAssociation: one -> many
class Order(Base):
__tablename__ = 'orders'
id = Column(Integer, primary_key=True)
created_date = Column(DateTime, default=datetime.datetime.utcnow)
amount = Column(Integer)
items = relationship('Item', back_populates='order')
# OrderItemsAssociation -> Item: many -> one
class Item(Base):
__tablename__ = 'items'
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False)
price = Column(Integer, nullable=False)
orders = relationship('OrderItemsAssociation', back_populates='item')
class OrderItemsAssociation(Base):
__tablename__ = 'order_items'
order_id = Column(Integer, ForeignKey('orders.id'))
item_id = Column(Integer, ForeignKey('items.id'))
count = Column(Integer, nullable=False) # the number of item
order = relationship('Order', back_populates='items')
item = relationship('Item', back_populates='orders')
Create an order and add some items:
apple = Item(name='apple', price=20)
pear = Item(name='pear', price=12)
order = Order()
order_apple = OrderItemsAssociation(count=2) # two apple
order_apple.item = apple
order.items.append(order_apple)
order_pear = OrderItemsAssociation(count=3) # three pear
order_pear.item = pear
order.items.append(order_pear)
Self-Referential/Adjacency List
The adjacency list pattern is a common relational pattern whereby a table contains a foreign key reference to itself. This is the most common way to represent hierarchical data in flat tables. Other methods include nested sets, sometimes called “modified preorder”, as well as materialized path.
Tree structure
Categories are organized as tree structure, i.e. parent category has many children categories and a child category belongs to a parent category:
class Category(Base):
__tablename__ = 'categories'
id = Column(Integer, primary_key=True)
name = Column(String(255))
parent_id = Column(Integer, ForeignKey('categories.id'))
children = relationship('Category') # one-to-many
The above relationship is one-to-many. To establish the relationship as many-to-one, an extra directive is added known as remote_side
:
class Category(Base):
__tablename__ = 'categories'
id = Column(Integer, primary_key=True)
name = Column(String(255))
parent_id = Column(Integer, ForeignKey('categories.id'))
parent = relationship('Category', remote_side=[id]) # many-to-one
As always, both directions can be combined into a bidirectional relationship using the backref()
function:
class Category(Base):
__tablename__ = 'categories'
id = Column(Integer, primary_key=True)
name = Column(String(255))
parent_id = Column(Integer, ForeignKey('categories.id'))
# one-to-many and many-to-one
children = relationship('Category',
backref=backref('parent', remote_side=[id]))
Composite primary key
A sub-category of the adjacency list relationship is the rare case where a particular column is present on both the “local” and “remote” side of the join condition, like the following account_id
column.
We know a file system organizes the folders as a tree. If our system is for multi-user share, an unique folder is determined by folder_id
and account_id
, and there must be existed an tree folders structure for each user.
from sqlalchemy.schema import ForeignKeyConstraint
class Folder(Base):
__tablename__ = 'folder'
# FOREIGN KEY(account_id, parent_id) REFERENCES folder (account_id, folder_id)
__table_args__ = (
ForeignKeyConstraint(
['account_id', 'parent_id'],
['folder.account_id', 'folder.folder_id']),
)
account_id = Column(Integer, primary_key=True)
folder_id = Column(Integer, primary_key=True)
parent_id = Column(Integer)
name = Column(String)
parent = relationship("Folder",
backref="children",
remote_side=[account_id, folder_id]
)
The account_id
column references itself and parent_id
references the folder_id
column. As said before, the account_id
is present on both the “local” and “remote” side of the join condition, to ensure the folders belong to the same user. Thus the primary key folder_id
is used to construct the tree structure and the another account_id
is used to track user authorship.
Querying
Query the “fruit/apple” category
from sqlalchemy.orm import aliased
ParentCategory = aliased(Category)
session.query(Category).\
filter(Category.name == 'apple').\
join(ParentCategory, Category.parent).\
filter(ParentCategory.name == 'fruit').\
all()
Or do the same thing
session.query(Category).\
filter(Category.name == 'apple').\
join(Category.parent, aliased=True).\
filter(Category.name == 'fruit').\
all()
Query the “food/fruit/apple”
session.query(Category).\
filter(Category.name == 'apple').\
join(Category.parent, aliased=True).\
filter(Category.name == 'fruit').\
join(Category.parent, aliased=True, from_joinpoint=True).\
filter(Category.name == 'food').\
all()
Eager loading
Eager loading of relationships occurs using joins or outerjoins from parent to child table during a normal query operation, such that the parent and its immediate child collection or reference can be populated from a single SQL statement, or a second statement for all immediate child collections. If you want to use eager loading with a self-referential relationship, SQLAlchemy needs to be told how many levels deep (join_depth
) it should join and/or query; otherwise the eager load will not take place at all.
class Category(Base):
__tablename__ = 'categories'
id = Column(Integer, primary_key=True)
name = Column(String(255))
parent_id = Column(Integer, ForeignKey('categories.id'))
# one-to-many and many-to-one
children = relationship('Category',
lazy="joined",
join_depth=2)
Foreign Key
The relationships are build upon the SQL’s foreign key constraints mechanism.
A foreign key in SQL is a table-level construct that constrains one or more columns in that table to only allow values that are present in a different set of columns, typically but not always located on a different table. We call the columns which are constrained the foreign key columns and the columns which they are constrained towards the referenced columns. The referenced columns almost always define the primary key for their owning table, though there are exceptions to this. The foreign key is the “joint” that connects together pairs of rows which have a relationship with each other, and SQLAlchemy assigns very deep importance to this concept in virtually every area of its operation. There are two ways to define foreign key constraints:
Single-column foreign keys
The single column foreign key is common, and at the column level is specified by constructing a ForeignKey
object as an argument to a Column
object:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(255))
class Address(name):
__tablename__ = 'addresses'
id = Column(Integer, primary_key=True)
data = Column(Text)
# foreign key
user_id = Column(Integer, ForeignKey('users.id'))
Table-clause foreign keys
Foreign keys may also be defined at the table level, using the ForeignKeyConstraint
object. This object can describe a single- or multi-column foreign key. A multi-column foreign key is known as a composite foreign key, and almost always references a table that has a composite primary key.
from sqlalchemy.schema import ForeignKeyConstraint
class One(Base):
__tablename__ = 'one'
foo_id = Column(Integer, primary_key=True)
bar_id = Column(Integer, primary_key=True)
data = Column(Text)
class Many(Base):
__tablename__ = 'many'
__table_args__ = (
ForeignKeyConstraint(
['foo_id', 'bar_id'],
['folder.foo_id', 'folder.bar_id']),
)
id = Column(Integer, primary_key=True)
foo_id = Column(Integer, nullable=False)
bar_id = Column(Integer, nullable=False)
data = Column(Text)
Note: the ForeignKeyConstraint
is the only way to define a composite foreign key.
Cascading update and delete
Most databases support cascading of foreign key values, that is the when a parent row is updated the new value is placed in child rows, or when the parent row is deleted all corresponding child rows are set to null or deleted. In data definition language these are specified using phrases like “ON UPDATE CASCADE”, “ON DELETE CASCADE”, and “ON DELETE SET NULL”, corresponding to foreign key constraints.
The ForeignKey
andForeignKeyConstraint
objects support the generation of this clause via the onupdate
and ondelete
keyword arguments.
child = Table('child', meta,
Column('id', Integer,
ForeignKey('parent.id', onupdate="CASCADE", ondelete="CASCADE"),
primary_key=True
)
)
composite = Table('composite', meta,
Column('id', Integer, primary_key=True),
Column('rev_id', Integer),
Column('note_id', Integer),
ForeignKeyConstraint(
['rev_id', 'note_id'],
['revisions.id', 'revisions.note_id'],
onupdate="CASCADE", ondelete="SET NULL"
)
)
Note that these clauses require InnoDB
tables when used with MySQL. They may also not be supported on other databases.
JOIN Tables
To retrieve data from a database that has relationships, we often need to use JOIN queries.