Database Setup
Setting up the Structure
I’m using SQLAlchemy so I first created a new file called tables.py
in which I define the Database tables. Using the outlined structure this is relatively straight forward. I’ll just create a class for each table containing each of the mentioned fields. For some fields, constraints have to be introduced (e.g. adding a constraint for sentiment_score to be between 1 and 100). This can be done by using SQLAlchemy’s CheckConstraint
function, which allows SQL injection and filtering based on the given argument.
I also need to define the relationships to other tables when using foreign keys. I’ll use SQLAlchemy’s relationship function and delete orphans when a parent gets removed.
Lastly I provided a __repr__
function for each table to make it clear what the classes are made out of and therefore provide better readability.
Setting up the Database
Now that the structure is defined I will have to set up the PostgreSQL database so it can accessed in the main program. This means creating a db.py
file which uses the database credentials to create the engine and session factory which will later be used to access the database.
When creating the SessionLocal object it’s important to make sure to disable autocommit and autoflush. I disabled autocommit since I want explicit control over if and when changes are committed to the database and I disable autoflush because it can lead to performance issues when querying frequently (it also allows me to batch queries together and in general have more control over when data is written/synchronized to the database).
Next I’ll will create a deploy_db.py
script. This is optional and not strictly necessary, however I personally like the option to manually deploy the entire DB by just executing a script. deploy_db.py
checks if the database already exists and creates it if it doesn’t. My first approach was to connect to an established database and then check if pg_database (the database register) includes the specified database. However this did not feel pythonic at all and overall seemed like a bad practice since I had to inject SQL directly to go through with this approach. So I researched a bit and found there was actually a separate library called sqlalchemy_utils
which has database_exists
and create_database
functions which take care of this problem in a cleaner way. The deployment script also tries to create the tables defined in the SQLAlchemy models if they don’t already exist. This file will not be imported from any other one. However I will later include a check to ensure the database is present before running the main program. If it’s not, the program will exit and print an error to the user suggesting he runs the deploy_db.py
script.