arcusiridis/content/post/dsl-based-sql-libraries.md

186 lines
8.3 KiB
Markdown
Raw Normal View History

2020-06-05 07:49:40 +00:00
+++
title = "Dsl Based Sql Libraries"
date = "2020-06-04T12:11:16+05:30"
featured_image="/img/dsl-based-sql-libraries.webp"
categories = ["Development"]
tags = [
"slick", "scala","rust","diesel","kotlin","jOOQ"
]
draft = false
+++
Note: This article is a work in progress
> Databases are the heart of a web application
When looking for relational database SQL libraries, there are various options to choose from:
1. Object Relation Mappers(ORMs)
2. DSL based
3. String interpolation based
Each approach has it's pro's and cons. ORM's are perhaps the most common/widely used, and their drawbacks are well documented and talked about, such as Object-Relational impedance mismatch, lack of control over generated SQL leading to performance problems, n+1 queries problem, and the promise of not having to deal with SQL only shifts that burden to having to maintain and configure the ORM itself.
I used Hibernate(an ORM) for my project Chatto and it worked out fine, mostly. However while learning other programming languages like Scala and Rust, I came across the SQL libraries used in their ecosystem, and it really opened my eyes to the possibilities. In this article, I document my experience in learning these libraries.
What these libraries have in common is that they
* Offer a DSL to write SQL queries in the host programming language
* Offer code generation to generate table mappings from sql schema files
* Most importantly, they offer the opportunity to write parts of query separately, and then compose them together to create larger queries.
Let me explain that last point because it really is what's potentially amazing about this approach, and it is not offered by neither ORM's nor String interpolation based libraries.
In JPQL, we'd write a query like this -
```
select u.id, u.name, u.join_date from User u where u.age > ?
```
and then we need to repeat the select part again in another query -
```
select u.id, u.name, u.join_date from User u where u.address = ?
```
Not only is this not DRY, if we ever need to change the select statement, and we forget to update any one of the queries, it would lead to an error at runtime, not compile time.
In a DSL based library, we could isolate the select portion like this -
``` scala
val selectUsers = Users.map(u => (u.id, u.name, u.join_date))
```
and then we could use that fragment everywhere we need it -
```scala
val users = selectUsers.filter(_.age > 10)
val users2 = selectUsers.filter(_.address === "Some Address")
```
So we were able to isolate the common portion out. If we need to update the select statement, we only need to update it at one place. If we forget to update any of the return types, we'll get an error at compile time. So we enforced type safety at compile time as well. Great!
The benefits of this style is summarized best in this post I found on [Hacker News](https://news.ycombinator.com/item?id=14915156) -
> You can slice and dice a schema however you like, building up queries from other queries, none of which are run until you execute them. Basically you get semantically the same query plans as you'd get writing plain sql, but it's all type checked, sql injection safe, and compiled (queries generally are generated at compile time, not run time).
The above code snippets are from Slick, which is a Scala library, and I'll talk about it now.
### Slick (Scala)
Slick is a Functional Relational Mapper(FRM). It's aim is to enable writing SQL as if it were Scala collections.
Slick offers code generation(using an sbt plugin), query composition, and an asynchronous API(although the underlying JDBC library is still blocking).
Together with Flyway, it becomes really simple to get started. All we need to do is write the sql schema migration files, run flyway migration task, run slick codegen, and we can start writing SQL code in Scala. Slick puts the generated code in a Tables.Scala file that must be imported into DAOs.
I found a really nice pattern of achieving query composition in slick [here](https://www.becompany.ch/en/blog/2016/12/15/slick-dos-and-donts). A potential pitfall of query composition is sharing the internals too much, and this pattern helps avoid that.
What we do is constrain the query fragments inside an inner companion Object of an outer DBIO class -
```scala
case class UserDTO(id: Int, name: String, joinDate: Instant)
class UserDBIO {
...
object Query {
val selectUsers = Users
.map(u => (u.id, u.name, u.join_date).mapTo[UserDTO])
}
}
```
and then inside the DBIO class, we write methods that return DBIOs -
``` scala
def getUserByAge(age: Int): DBIO[UserDTO] = {
Query.selectUsers.filter(_.age > age).result
}
def getUserByAddress(address: String): DBIO[UserDTO] = {
Query.selectUsers.filter(_.address === address).result
}
```
and then inside a service class, we can compose those DBIOs if we need -
```scala
class UserService @Inject() (dbio: UserDBIO) {
def isSamePerson(age: Int, address: String): Future[Boolean] = {
val action = for {
user1 <- dbio.getUserByAge(age)
user2 <- dbio.getUserByAddress(address)
} yield (user1.id == user2.id)
db.run(action.transactionally)
}
}
```
This is a very contrived example, but it does show how to compose two DBIOs into a single action, that run in a single transaction. Slick is asynchronous by default, so it returns a future which would eventually contain the result, instead of returning the result itself.
### Diesel (Rust)
Diesel calls itself an ORM which means it must be similar to Hibernate but as we'll see, it's nothing like Hibernate and resembles Slick much more. Apparently, Diesel's design was influenced by Slick ([proof](https://news.ycombinator.com/item?id=14913517)).
Diesel wraps code generation and migration in a single library. The set up phase is very similar to the Slick + Flyway method - write sql schema files, run migration, then run code generation. Diesel puts the generated code in a schema.rs file.
Query composition in Diesel works as follows -
```rust
mod query {
use diesel::prelude::*;
use diesel::sql_types::Text;
use diesel::sql_types::Timestamp;
use diesel::sqlite::Sqlite;
/// <'a, B, T> where a = lifetime, B = Backend, T = SQL data types
type Query<'a, B, T> = crate::schema::users::BoxedQuery<'a, B, T>;
pub fn _get_user_by_name<'a>(
user_name: &'a String,
) -> Query<'a, Sqlite, (Text, Timestamp)> {
use crate::schema::users::dsl::*;
users
.select((name, created_at))
.filter(name.eq(user_name))
.into_boxed()
}
}
```
and then this fragment can be used anywhere in the same file -
``` rust
let user =
query::_get_user_by_name(&nu.name).first::<models::UserDTO>(conn)?;
```
Since the query module is not public, it cannot be imported outside of the file it is defined in, and we achieve the same encapsulation behavior as we did in Slick.
Note that we need to write `` `into_boxed()` ``, unlike Slick. This is because Scala is an interpreted language and can automatically handle the conversion. Since Rust is a low-level language, we need to be explicit about the conversion. At least, that's my understanding :)
You can find the full example [here](https://git.arcusiridis.com/nova/Actix-Demo/src/branch/master/src/actions/users.rs).
### JOOQ (Java, Kotlin, Scala)
JOOQ is a type safe query building library with a fluent API. It supports all three of Java, Kotlin and Scala, however for some variety I'll be using Kotlin here.
Setup works similarly as in Slick, with flyway for migration leading to code generation. The query API however is significantly different from Slick. The design goal of JOOQ is to put SQL at the forefront, and it shows in the API. Composition in JOOQ works as follows-
``` kotlin
class UserService(dsl: DSLContext) {
suspend fun getUser(): Flow<String> = dsl
.select(Query.selectUsers)
.from(USERS)
.where(Query.complexClause)
.fetch()
.map {it.get(USERS.NAME)}
.asFlow()
private object Query {
val selectUsers = listOf(USERS.NAME, USERS.JOIN_DATE)
// much complex wow
val complexClause: Condition = USERS.AGE.gt(20)
}
}
```
We follow the same query encapsulating pattern as earlier. We also make use of kotlin's suspend function feature and coroutine flow to work asynchronously.