# Getting Started

*[back to index](./README.md)*

`rq` lets you apply [Rego](https://www.openpolicyagent.org/docs/latest/policy-language/) transformations on data in various formats conveniently in a CLI environment.

Installation instructions can be found in [the README](../README.md#installation-instructions).

This guide assumes a basic level of familiarity with Rego. Some Rego resources include:
* [Open Policy Agent - Policy Language](https://www.openpolicyagent.org/docs/latest/policy-language/)
* [Styra Academy - OPA Policy Authoring](https://academy.styra.com/courses/opa-rego) (free, requires account creation)
* [OPA Guidebook - Chap 3. Rego](https://sangkeon.github.io/opaguide/chap3/rego.html)

By default, `rq` reads from standard in and writes to standard out, using JSON for both.

```plain
$ cat sample1.json
{
    "fruits": ["apple", "pear", "pineapple", "banana"],
    "vegtables": ["celery", "carrot", "bell pepper", "onion"]
}
$ rq < sample1.json
{
	"fruits": [
		"apple",
		"pear",
		"pineapple",
		"banana"
	],
	"vegtables": [
		"celery",
		"carrot",
		"bell pepper",
		"onion"
	]
}
```

You can also use `rq` to convert between different formats using the `-i/--input-format` and `-o/--output-format` flags as well.

```plain
$ rq -o yaml < sample1.json > sample1.yaml
$ cat sample1.yaml
fruits:
    - apple
    - pear
    - pineapple
    - banana
vegtables:
    - celery
    - carrot
    - bell pepper
    - onion
$ rq -i yaml -o json < sample1.yaml
{
	"fruits": [
		"apple",
		"pear",
		"pineapple",
		"banana"
	],
	"vegtables": [
		"celery",
		"carrot",
		"bell pepper",
		"onion"
	]
}
```

Of course, you can apply a Rego query to the input. Here we use the query `input.fruits` to extract just the `fruits` field from the input document.

```plain
$ rq input.fruits < sample1.json
[
	"apple",
	"pear",
	"pineapple",
	"banana"
]
```

This can be combined with changing the format:

```plain
$ rq -o yaml input.fruits < sample1.json
- apple
- pear
- pineapple
- banana
```

There is also a raw output mode (`-o raw/--raw/-R`), which can be useful when other shell tools need to consume `rq`'s output:

```plain
$ rq -o raw input.fruits < sample1.json
apple
pear
pineapple
banana
```

It is possible to work with multiple data files using the `--data` flag:

```
$ rq --data sample2.json --data sample1.json 'data'
{
	"sample1": {
		"fruits": [
			"apple",
			"pear",
			"pineapple",
			"banana"
		],
		"vegtables": [
			"celery",
			"carrot",
			"bell pepper",
			"onion"
		]
	},
	"sample2": [
		"this",
		"file",
		"is",
		"an",
		"array"
	]
}
```

You can override where the data is loaded within the `data` package:

```
$ rq --data 'rego.path=alternate_path:sample1.json' 'data'
{
	"alternate_path": {
		"fruits": [
			"apple",
			"pear",
			"pineapple",
			"banana"
		],
		"vegtables": [
			"celery",
			"carrot",
			"bell pepper",
			"onion"
		]
	}
}
```

The argument given to `--data` uses the [DataSpec format](./dataspec.md), a bespoke format designed for `rq` to allow specifying options concisely on a per-file basis.

Just like with the main input document, you can use `--data` to load files in a variety of formats such as CSV:

```plain
$ rq --data csv.headers=true:./area.csv --data csv.headers=true:./population.csv 'data'
{
	"area": [
		{
			"country": "United States",
			"land area": 3531905
		},
		{
			"country": "Mexico",
			"land area": 761610
		},
		{
			"country": "France",
			"land area": 248573
		},
		{
			"country": "Japan",
			"land area": 145937
		}
	],
	"population": [
		{
			"country": "United States",
			"population": 331893745
		},
		{
			"country": "Mexico",
			"population": 126014024
		},
		{
			"country": "France",
			"population": 67897000
		},
		{
			"country": "Japan",
			"population": 125927902
		}
	]
}
```

(Area and population data shown is sourced from Wikipedia, retrieved 2022-08-19. Area is listed in square miles.)

Some options like `csv.headers` have shorthand flags as well. For example `csv.headers=true` can also be set via `-H/--csv-headers`, which makes it the default for the input and all `--data` flags, unless it is specifically overridden:

```plain
$ rq -H --data ./area.csv --data ./population.csv 'data'
{
	"area": [
		{
			"country": "United States",
			"land area": 3531905
		},
		{
			"country": "Mexico",
			"land area": 761610
		},
		{
			"country": "France",
			"land area": 248573
		},
		{
			"country": "Japan",
			"land area": 145937
		}
	],
	"population": [
		{
			"country": "United States",
			"population": 331893745
		},
		{
			"country": "Mexico",
			"population": 126014024
		},
		{
			"country": "France",
			"population": 67897000
		},
		{
			"country": "Japan",
			"population": 125927902
		}
	]
}
$ rq -H --data csv.headers=false:./area.csv --data ./population.csv 'data'
{
	"area": [
		[
			"country",
			"land area"
		],
		[
			"United States",
			3531905
		],
		[
			"Mexico",
			761610
		],
		[
			"France",
			248573
		],
		[
			"Japan",
			145937
		]
	],
	"population": [
		{
			"country": "United States",
			"population": 331893745
		},
		{
			"country": "Mexico",
			"population": 126014024
		},
		{
			"country": "France",
			"population": 67897000
		},
		{
			"country": "Japan",
			"population": 125927902
		}
	]
}
```

Let's see if we can use what we have learned so far to find which country has the highest population density. First, we'll need to zipper together the population and area data. Here we use an object comprehension to find every pair `a, p` where `a` is an object from the `area.csv` file, and `p` is an object from the `population.csv` file so that `a` and `p` have the same `country` field. We combine those together using `object.union()` and index by the country name.

```plain
$ rq -H --data ./area.csv --data ./population.csv '{a.country: object.union(a, p) | a := data.area[_]; p:= data.population[_]; a.country == p.country}'
{
	"France": {
		"country": "France",
		"land area": 248573,
		"population": 67897000
	},
	"Japan": {
		"country": "Japan",
		"land area": 145937,
		"population": 125927902
	},
	"Mexico": {
		"country": "Mexico",
		"land area": 761610,
		"population": 126014024
	},
	"United States": {
		"country": "United States",
		"land area": 3531905,
		"population": 331893745
	}
}
```

Let's add in the density (people/sq mile) calculation as well. We can switch to `object.union_n()` to also merge in an object containing the calculated density:

```plain
$ rq -H --data ./area.csv --data ./population.csv '{a.country: object.union_n([a, p, {"density": p.population/a["land area"]}]) | a := data.area[_]; p:= data.population[_]; a.country == p.country}'
{
	"France": {
		"country": "France",
		"density": 273.14712378255079997,
		"land area": 248573,
		"population": 67897000
	},
	"Japan": {
		"country": "Japan",
		"density": 862.8922206157451503,
		"land area": 145937,
		"population": 125927902
	},
	"Mexico": {
		"country": "Mexico",
		"density": 165.45741783852627986,
		"land area": 761610,
		"population": 126014024
	},
	"United States": {
		"country": "United States",
		"density": 93.97017898273028295,
		"land area": 3531905,
		"population": 331893745
	}
}
```

Finally, we can select only the country name corresponding to the object with the highest density:

```plain
$ rq -H --data ./area.csv --data ./population.csv 'merged := {a.country: object.union_n([a, p, {"density": p.population/a["land area"]}]) | a := data.area[_]; p:= data.population[_]; a.country == p.country}; {m.country | m := merged[_]; m.density >= max([m2.density | m2 := merged[_]])}'
[
	true,
	[
		"Japan"
	]
]
```

Because we have two separate Rego expressions separated by a `;`, `rq` outputs the result of both expressions. The assignment to `merged` was defined, so the result is simply `true`. The second expression yields the set of all countries which have the highest population density.

If we wanted to get only `"Japan"`, without the result of the first expression, we could either pipe the output of the previous command to another `rq` invocation (`... | rq 'input[1]'`), or we could save the value of `merged` to a temporary file and use that as an input to a second `rq` call. Finally, we could also use a script to converted `merged` to a proper helper rule. To demonstrate `rq script`, we'll do the latter.

```plain
$ cat analysis.rego
# rq: csv-headers true
# rq: data-paths ./area.csv
# rq: data-paths ./population.csv

merged[country] = obj {
	aobj := data.area[_]
	pobj := data.population[_]
	aobj.country == pobj.country
	country := aobj.country

	obj := {
		"land area": aobj["land area"],
		"population": pobj["population"],
		"density": pobj.population / aobj["land area"]
	}
}

max_density := max({o.density | o := merged[_]})

most_dense[country] {
	obj := merged[country]
	obj.density >= max_density
}
$ rq script ./analysis.rego
{
	"max_density": 862.8922206157451503,
	"merged": {
		"France": {
			"density": 273.14712378255079997,
			"land area": 248573,
			"population": 67897000
		},
		"Japan": {
			"density": 862.8922206157451503,
			"land area": 145937,
			"population": 125927902
		},
		"Mexico": {
			"density": 165.45741783852627986,
			"land area": 761610,
			"population": 126014024
		},
		"United States": {
			"density": 93.97017898273028295,
			"land area": 3531905,
			"population": 331893745
		}
	},
	"most_dense": [
		"Japan"
	]
}
```

If we're just interested in the most densely populated country(s), we could pipe the result of this script into another `rq` command:

```plain
$ rq script ./analysis.rego | rq --raw 'input.most_dense'
Japan
```

Or we could modify the `analysis.rego` script to include the directive `# rq: query data.script.most_dense`:

```plain
$ head -n 4 analysis.rego
# rq: csv-headers true
# rq: data-paths ./area.csv
# rq: data-paths ./population.csv
# rq: query data.script.most_dense
$ rq script ./analysis.rego
[
	"Japan"
]
```

