Skip to content
Untitled design-Aug-24-2022-05-11-24-06-PM
Dominik RichterJanuary 5, 20236 min read

Why MQL: An Extension of GraphQL

Why-MQL-GraphQL-extension

MQL is Mondoo’s own GraphQL-based query and policy language for exploring and testing infrastructure. Find out why and how we created it for platform and security engineers.


When we started building open source cnquery and cnspec, we knew we needed a lightweight query and policy language. We searched for an existing language that was: 

  • Great at searching and retrieving platform data
  • Capable of expressing assertions
  • Usable by platform and security experts 

Easier said than done!

We explored countless stacks (including Python, OPA/Rego, and InSpec). We evaluated their capabilities and performed usability studies with many different security experts and platform engineers. Each failed to meet at least one of our three core requirements. We eventually understood that an extended version of GraphQL would be most successful with both platform and security engineers and developers.

Begin securing your infrastructure with cnquery.

Infrastructure searches for platform and security teams 

Why did we focus on platform and security experts?

We found that the people with the largest impact on and the deepest experience in infrastructure are usually on platform and security teams. These professionals have grown more technical over the last decade. They've also picked up a healthy set of developer skills for automating infrastructure (such as Kubernetes, Terraform, and Ansible).

However, platform and security engineers do not spend the majority of their days developing with C++, Java, and the like. They don't want to be bothered with the intricacies of programming languages and weird memory optimizations. For them, it's about getting the job done.

Unfortunately, almost all existing solutions are built for developers.

Our primary goal was to empower platform and security teams to create infrastructure searches and assertions. We wanted to give them something that gets the job done without throwing an overly complicated stack at them. Think: Python, not C++. Bash, not Java.

Why GraphQL?

If you want to write a security assertion, the first thing you need is data.

How do you get data?

Apart from abstractions in programming languages, the most well-known ways to retrieve data are SQL, GraphQL, and RESTful APIs. The former is usually used directly on top of databases, while the latter is better known as a service interface.

We didn't want to create a new way of accessing data, given that there were existing patterns available (which means a lower learning curve and less throwaway knowledge). Given this, you might be wondering:

Why not SQL?

A very important change occurred in the last decade: Platform resources became more and more connected. For example, today it's important not only to find an open port, but also to understand:

  • What process is running it
  • Which user started it
  • What service is responsible
  • From which package it was installed

With SQL, it's easy to retrieve individual rows from a database, but it's an entirely different endeavor to connect and chain them together. Remember all the fun JOIN statements you were cursing by the end of your first database lecture? If you try to answer questions like the list above, you'll find yourself in a deep tangle of JOINs. 

Ultimately, SQL is high in complexity and harder to maintain. That's why so many graph-based solutions don't rely on traditional SQL.

GraphQL for graph traversal

GraphQL is JSON on steroids. It's very easy to learn and understand. JSON is virtually everywhere, made popular through its use in RESTful APIs and JavaScript. Stepping from JSON to GraphQL is a no-brainer. 

GraphQL makes it easy to explore related data objects. You can explore connections between resources by adding just another line to the query. Because of its ability to chain resources and quickly select the fields that matter, it has become a powerful alternative to RESTful APIs in many services.

processes {   // list of processes on this system
  pid         // the process PID
  user {      // list the user that runs this process
     name     // grab the user's name...
     uid      // ... and UID
  }
}

Why extend GraphQL?

I might have given you the impression that we found the perfect solution in GraphQL. While it was very close, it didn't quite meet the needs of our use cases. So we built MQL as an extension of GraphQL.

GraphQL extensions like MQL are common. These graph databases are modifications of GraphQL that are tailored to fit a specific environment.

These are some of the most exciting ways that MQL extends GraphQL:

  • Make filters universal
  • Select *
  • Assertions
  • Lightweight scripts

Let's explore each one.

Make filters universal

GraphQL usually hands filters to the backend. Filters are custom to resources. You specify them in parentheses right after the query field:

users(uid: 0) {
  name
}

GraphQL works well enough for easier problems, but often doesn't allow more powerful searches. Because we couldn't anticipate (and didn't want to limit) all the questions that our users might ask, we needed a better approach.

Enter MQL's filters:

users.where( uid > 0 && uid < 1000 ) {
  name
}

These where statements are accessible to all MQL resources and allow you to filter based on any of a resource's fields, including regular expressions and time checks:

accounts.where( 
  email == /gmail.com$/ &&
  lastLogin > time.today
)

Select *

We understand why GraphQL generally doesn't support this. However, for our use case, it made a lot of sense to select * and explore data more freely. We often forget what fields are supported. Even with auto-complete in the CLI and editors, it's often easier to just grab all data and later break it down. In some cases, users also just want to collect everything they can about a resource.

For example, this query returns all fields in the users data structure:

users{*}
users: [
  0: {
	uid: 0
	gid: 0
	shell: "/bin/bash"
	home: "/root"
	name: "root"
	group: group name="root" gid=0
	sshkeys: []
	...
  }

Assertions

As I mentioned in the beginning of this article, the design goals for MQL included both data extraction and policy as code. We wanted to make sure MQL could check for security and best practices across resources, but we didn't want it to be as complicated as so many other solutions for platform and security engineers.

At the heart of every test are assertions: Does something behave the way you expect?

As a pure query language, GraphQL doesn't support assertions out of the box. This was a great opportunity to extend it and add checks for simple and complicated tests.

The simplest of tests boil down to boolean statements like:

user.name == "Alice"
tls.versions == ["tls1.3"]
uid > 1000 && uid < 9999

For lists, we made it even easier. In our past experience, we saw that most users constructed very complicated loops, which got almost impossible to maintain once chained. With MQL, it's much simpler:

// no user should have a uid smaller than 0
users.none( uid < 0 )

// a package with the regex /ssh/ in its name should be installed
packages.contains( name == /ssh/ && installed )

// all listening ports should have a port number smaller than 1000
ports.listening.all( port < 1000 )

Lightweight scripts

To make MQL flexible, we added very simple scripting elements. They let you assign and use variables or combine values.

myFile = user.home + "/my.file"

MQL queries are pre-compiled and adhere to a schema. This allows us to find errors early and make sure types are used correctly. It also results in an incredibly fast execution.

Additionally, MQL executes concurrently and parallel by default. This means that complex queries are split into multiple streams with separate I/O requirements that then execute concurrently.

An open standard 

MQL opens new ways for platform and security engineers to express their queries and assertions. It's built as a flexible framework that is not tied to any one tool or target technology.

MQL is extensible, allowing new resources and fields to be added at any time. Given its high-performance nature, it is ideal as an embedded engine for even better automation.

We have seen the benefits MQL brings to both modern and established platform teams and environments. It enables our full-stack security solution and finds config and security issues across everything. 

We hope you enjoy it as well! Keep scanning and discover new things about your fleet. If you have questions or ideas, we'd love to see you in our community!

Begin securing your infrastructure with cnquery.

avatar

Dominik Richter

Dom is a founder, coder, and hacker and one of the creators of Mondoo. He helped shape the DevOps and security space with projects like InSpec and Dev-Sec.io. Dom worked in security and automation at companies like Google, Chef, and Deutsche Telekom. Beyond his work, he loves to dive deep into hacker and nerd culture, science and the mind, and making colorful pasta from scratch.

RELATED ARTICLES

view raw