Flatline user manual¶
S-expressions vs. JSON¶
Flatline expressions in this manual use its lisp-like syntax, based on symbolic expressions or sexps. When sending them to BigML via our API, you can also use their JSON representation, which is trivially obtained by using JSON lists for each paranthesised sexp. For instance:
(if (< (f "a") 3) 0 4) => ["if", ["<", ["f", "a"], 3], 0, 4]
Constant numbers, symbols, booleans and strings, using Java/Clojure syntax are valid expressions.
1258 2.349 this-is-a-symbol "a string" true false
While running over an input dataset, Flatline keeps track of the
(zero-based) number of the input row that’s being used, which can be
accessed with the function
row-number, which takes no arguments:
(row-number) => current input row (0-based)
A typical use of this function is to generate a unique identifier for each row. The row number will start at 0 unless you skip some rows of the input dataset, and increase by one on each new row (unless you’re specifying a input row step when generating a dataset).
Input field values are accessed using the
(field <field-designator> [<shift>]) <field-designator> := field id | field name | column_number <shift> := integer expression
<field-designator> can be either the identifier, name or
column number of the desired field, and the optional
integer, defaulting to 0) denotes the offset with respect to the current
So, for instance, these sexps denote field values extracted from the current row:
(field 0) (field 0 0) (field "000004") (field "a field name" 0)
(field "000001" -2)
denotes the value of the cell corresponding to a field with identifier “000001” two rows before the current one. Positive shift values denote rows after the current one.
(field "a field" 3) (field "another field" 2)
For convenience, and since
field is probably going to be your most
often user operator, it can be abbreviated to
(f "000001" -2) (f 3 1) (f "a field" 23)
We also provide a predicate,
missing?, that will tell you whether
the value of the field for the given row (taking into account the shift,
if any) is a missing token:
(missing? <field-designator> [<shift>])
(missing? "species") (missing? "000001" -2) (missing? 3 1) (missing? "a field" 23)
will all yield boolean values. For backwards compatibility,
is an alias for
Randomized field values¶
There are two Flatline functions that will let you generate a random value in the domain of a given field, given its designator:
(random-value <field-designator>) (weighted-random-value <field-designator>)
(random-value "age") (weighted-random-value "000001") (weighted-random-value 3)
Both functions generate a value with the constrain that it belongs to
the domain of the given field, but while
random-value uses a uniform
probability of the field’s range of values,
uses de distribution of the field values (as computed in its histogram)
as the probability measure for the random generator.
These two functions work for numeric, categorical and text fields, with generated values satisfying:
- For numeric fields, generated values are in the interval
[(minimum <fid>), (maximum <fid>)]
- For categorical fields, generated values belong to the set
- For text fields, we generate terms in the field’s tag cloud (generated values correspond to single terms in the cloud).
- Datetime parent fields are not supported, since they don’t have a defined distribution: you can use any of their numeric children for generating values following their distributions.
A common use of these functions is replacing missing values with random data, which in Flatline you could write as, say:
(if (missing? "00000") (random-value "000000") (f "000000"))
We provide a shortcut for those common operations with the functions
(ensure-value <fdes>) := (if (missing? <fdes>) (random-value <fdes>) (field <fdes>)) (ensure-weighted-value <fdes>) := (if (missing? <fdes>) (weighted-random-value <fdes>) (field <fdes>))
We them, our example above can be simply written as:
or, if you want that the generated random values follow the same distribution as the field “000000”:
Normalized field values¶
For numeric fields, it’s often useful to normalize their values to a
standard interval (usually [0, 1]). To that end, you can use the
normalize, which takes as arguments the
designator for the field you want to normalize and, optionally, the two
bounds of the resulting interval:
(normalize <id> [<from> <to>]) => (+ from (* (- to from) (/ (- (f id) (minimum id)) (- (maximum id) (minimum id)))))
(normalize "000001") ;; = (normalize "000001" 0 1) (normalize "width" -1 1) (normalize "length" 8 23)
As shown in the formula above,
normalize linearly maps the minimum
value of the field to
from (0 by default) and the maximum value to
to (1 by default).
Besides this linear normalization, it’s also common to standardize numeric data values by mapping them to a gaussian, according to the equation:
x[i] -> (x[i] - mean(x)) / standard_deviation(x)
or, in flatline terms:
(/ (- (f <id>) (mean <id>)) (standard-deviation <id>))
This normalization function is called the Z score, and we provide it as
(z-score "000034") (z-score "a numeric field") (z-score 23)
normalize, the field used must have a numeric optype.
Vectorized categorical or text fields¶
It may be useful to convert categorical or text fields to numeric values
for models which accept only numeric data as input. This can be
accomplished with the Flatline primitive
(vectorize <field-designator> [<max-fields>])
For categorical fields, the output is a binary indicator vector. In
other words, it is a list of numeric fields, one per possible
categorical value, and for each instance, the numeric field
corresponding to the category of that instance will have a value of
1, whereas the remaining numeric fields will have a value of
For text fields, the output is a list of numeric fields, each corresponding to a term in the field’s tag cloud. The value of each field is the number of times that term appears in that instance.
A numeric expression or literal can be passed as an optional second argument to limit the number of generated fields to the n most frequent categories or text terms.
Field descriptors contain lots of properties with metadata about the
field, including its summary. These propeties (when they’re atomic) can
be accessed via
(field-prop <type> <field-descriptor> <property-name> ...) <type> := string | numeric | boolean
For instance, you can access the name for field “00023” via:
(field-prop string "00023" name)
or the value of the nested property missing_count inside the summary with:
(field-prop numeric "00023" summary missing_count)
We provide several shortcuts for concrete summary properties, to save you typing:
(maximum <field-designator>) (mean <field-designator>) (median <field-designator>) (minimum <field-designator>) (missing-count <field-designator>) (population <field-designator>) (sum <field-designator>) (sum-squares <field-designator>) (standard-deviation <field-designator>) (variance <field-designator>) (preferred? <field-designator>) (category-count <field-designator> <category>) (bin-center <field-designator> <bin-number>) (bin-count <field-designator> <bin-number>)
As you can see, the category and count accessors take an additional parameter designating either the category (a string or order number) and the bin (a 0-based integer index) you refer to:
(category-count "species" "Iris-versicolor") (category-count "species" (f "000004")) (bin-count "age" (f "bin-selector")) (bin-center "000003" 3) (bin-center (field "field-selector") 4)
Discretization of numeric fields¶
A simple way to discretize a numeric field is to assign a label to each of a finite set of segments, defined by a sequence of upper bounds. For instance:
(let (v (f "age")) (cond (< v 2) "baby" (< v 10) "child" (< v 20) "teenager" "adult"))
Flatline provides a shortcut for the above expression via its
(segment-label "000000" "baby" 2 "child" 10 "teenager" 20 "adult")
As you can see, the first argument is the field designator (as usual, a name, column number or identifier), followed by alternating labels and upper bounds. More generally:
(segment-label <fdes> <l1> <b1> ... <ln-1> <bn-1> <ln>) <l1> ... <ln> strings, <b1> ... <bn-1> numbers => (cond (< (f <fdes>) <b1>) <l1> (< (f <fdes>) <b2>) <l2> ... (< (f <fdes>) <bn-1>) <ln-1> <ln>)
The alternating labels and bounds must be constant strings and numbers. If you want to use segments of equal length between the minimum and maximum value of the field, you can omit the upper bounds and give simply the list of labels, e.g.
(segment-label 0 "1st fourth" "2nd fourth" "3rd fourth" "4th fourth")
which would be equivalent to:
(let (max (maximum 0) min (minimum 0) step (/ (- max min) 4)) (segment-label 0 "1st fourth" (+ min step) "2nd fourth" (+ min step step) "3rd fourth" (+ min step step step) "4th fourth"))
or, in general:
(segment-label <fdes> <l1> ... <ln>) with <l1> ... <ln> strings => (let (min (minimum <fdes>) step (- (maximum <fdes>) min) shift (- (f <fdes>) min)) (cond (< shift step) <l1> (< shift (* 2 step)) <l2> ... (< shift (* (- n 1) step)) <ln-1> <ln>))
Items and itemsets¶
A common operation on fields of optype items is to check whether they
contain a list of items. That can be used, for instance, to filter the
rows of a dataset that satisfy a given association rule, but calling
contains-items? with the list of items in the antecedent and
consequent of the desired rule.
(contains-items? <field-designator> <item_0> ... <item_n>) ;; with <item_i> of type string for i in [0, n]
contains-items primitive takes as first argument the descriptor
of the field we want to check (which must have optype items), followed
by the one or more items we want to check, which must all have type
string. For instance, the predicate:
(contains-items? "000000" "blue" "green" "darkblue")
will filter the rows whose first column satisfies the association rule
blue, green -> darkblue.
It is also possible to check whether an items field contains only the
given list of items (in any order), using
works exactly as
contains-items? except for the fact that it’s
(equals-to-items? <field-designator> <item_0> ... <item_n>) ;; with <item_i> of type string for i in [0, n]
Field population, percentiles &co for numeric fields¶
Besides direct readings from the field summaries, there exist other derived statistical properties available as Flatline functions. In particular, these are the ones related to population percentiles and their distribution for numeric fields:
(percentile <field-designator> <fraction>) ;; fraction in [0.0, 1.0] (within-percentiles? <field-designator> <lower> <upper>) (population-fraction <field-designator> <sexp>) (percentile-label <field-designator> <label-0> ... <label-n>)
The first one,
percentile, gives you the value that a numeric field
must have in order to be in the given population fraction. Thus, you
could use, for instance, the following predicate in a filter to remove
(<= (percentile "age" 0.5) (f "age") (percentile "age" 0.95))
We provide syntactic sugar for the above expression via
(within-percentiles? "age" 0.5 0.95)
Related to percentile is
population-fraction, which, given a field
identifier and a value, computes the number of instances of this field
whose value is less than the given one. As with the case of
percentile, the designated field must be numeric.
percentile-label computes the percentile the input value
belongs to and generates the label you provided. For instance, this
(percentile-label "000023" "1st" "2nd" "3rd" "4th")
will generate the label “1st” if the value of the field 000023 is in the first population “quartile” (since we’re providing 4 labels, we use 4 segments), “2nd” to the second, etc. The sexp above is equivalent to:
(cond (within-percentiles? "000023" 0 0.25) "1st" (within-percentiles? "000023" 0.25 0.5) "2nd" (within-percentiles? "000023" 0.5 0.75) "3rd" "4th")
and, as you see, it easily generalizes to any number of labels: if you had provided 5 labels we’d be computing “quintiles”; had them been 10, the labels would correspond to “deciles,” and so forth. As with all functions in this section, the target field must be numeric.
Note that we’re using scare quotes around quartile, quintiles, etc.
above. That’s because
percentile-label will assign to each value the
label of the lowest percentile it belongs to, and therefore, it won’t
really discretize your variable by exact quantiles: if the population is
skewed around a value, so it’ll be the resulting labels’ population.
Strings and regular expressions¶
Coercion and substrings¶
Any value can be coerced to a string using the
str operator, which
will also concatenate the corresponding strings when called with more
than one argument:
(str <sexp0> ...)
(str 1 "hello " (field "a")) ;; => "1hello <value of field a>" (str "value_" (+ 3 4) "/" (name "000001")) ;; => "value_7/a"
It is also possible to take a substring of a string value using the
(subs <string> <start> [<end>]) <start> in [0 (length <string>)) <end> in (0 (length <string>)]
It returns the substring of
<string> beginning at start inclusive,
and ending at end (defaults to length of string), exclusive.
The number of characters in a string value is given by
(length "abc") => 3 (length "") => 0
Note that the length of a missing value is a missing value, not zero.
levenshtein computes, as an integer, the distance
between two given string values:
(levenshtein <str-sexp0> <str-sexp1>)
Arbitrary arguments are allowed, provided they’re strings:
(levenshtein (f 0) "a random string") (if (< (levenshtein (f 0) "bluething") 5) "bluething" (f 0))
You can also compute the number of times a word appears in a given
string by means of the
occurrences function. It takes an input
string and the term to look for as mandatory parameters, and,
optionally, a language code, and a boolean controlling case sensitivity:
(occurrences <string> <term> [<case-insensitive?> <lang>]) <case-insensitive?> := true | false (defaults to false) <lang> := "en" | "es" | "ca" | "nl" | "none" (defaults to "none")
By default, terms matching is case sensitive and exact. The optional third argument is a boolean flag to turn on case insensitivity. Finally, if you provide a fourth constant argument specifying one of the known languages (English, Spanish, Catalan or Dutch), words are compared using their stems (e.g., in English, “day” and “days” will be considered the same term).
(occurrences "howdy woman, howdy" "howdy") => 2 (occurrences "howdy woman" "Man" true) => 0 (occurrences "howdy man" "Man" true) => 1 (occurrences "hola, Holas" "hola" true "es") => 2
There are several hashing functions available:
sha256. These functions act on the stream of bytes of their input
string and return a string representing the bytes that the cryptographic
digest they name produces, in their hexadecimal representation:
(md5 <string>) => string of length 32 (sha1 <string>) => string of length 40 (sha256 <string>) => string of length 64
(md5 "a text") => "b229386ec4627869d2c71b7df3c9600a" (sha1 "a text") => "7081f2babbafff16b4bae16282859c844baa14ef" (sha256 "") => "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
As shown, the returned strings use charaters in
represent the values of the output bytes: md5 produces 16 bytes (128
bits), sha-1 produces 20 bytes (160 bits) and sha-256 produces 32 bytes
Regular expression matching¶
matches? function takes a regular expression as a string and a
form evaluating to a string and returns a boolean telling you if the
latter matches the former.
(matches? <string> <regex-string>) => boolean <regex-string> := a string form representing a regular expression <string> := a string expression to be tested against the regexp
Regular expressions follow the Perl and Java syntax and extensions (see for instance this summary), including flags modifiers such as “(?i)” to turn on case-insensitive mode.
For instance, to check if the field “name” contains the word “Hal” anywhere, you could use:
(matches? (field "name") ".*\\sHal\\s.*") (matches? (field "name") "(?i).*\\shal\\s.*")
where the second form performs case-insensitive pattern matching.
It’s possible to use non-constant string values for the regular
expression, but take into account that any special character in the
string will be treated as such when it’s converted to a regular
expression. If what you want is to match literally the contents of a
(re-quote <string>) => regexp that matches <string> literally
and then you can write things like:
(if (matches? (f "result") (re-quote (f "target"))) "GOOD" "MISS")
and you can use the string concatenation operator
str to construct
regular expressions strings out of smaller pieces:
(matches? (f "name") (str "^" (re-quote (f "salutation")) "\\s *$"))
Regular expression search and replace¶
Given a string expression, you can substitute matches of a given regexp
by a given replacement string using
(replace <string> <regexp> <replacement>) (replace-first <string> <regexp> <replacement>)
(replace "Target string ss" "\\Ws" "S") => "TargetStringSs"
The replacement is literal, except that “$1”, “$2”, etc. in the replacement string are substituted with the string that matched the corresponding parenthesized group in the pattern. If you want them to appear literally in the replacement string, just use “\$1” and the like.
(replace "Almost Pig Latin" "\\b(\\w)(\\w+)\\b" "$2$1ay") => "lmostAay igPay atinLay"
replace replaces all occurrences of the regular expression,
replace-first stops after the first match:
(replace-first "swap first two words" "(\\w+)(\\s+)(\\w+)" "$3$2$1") => "first swap two words"
Flatline provides a primitive function,
language, that tries to
detect the language of a given string value. It returns the ISO 639 code
of the detected language, as a string.
(language <string>) => <ISO 639 string code>
(language "this is an English phrase") => "en"
Note that language detectors will do in general a very poor job for short texts, and that we currently limit the set of detected languages to those used in BigML’s text analysis facility (English, Spanish, Catalan or Dutch as of this writing, represented as “en”, “es”, “ca” and “nl”, respectively.)
Relational operators and equality¶
You can compare numeric and datetime values with any of the relational
>=, which can be applied to two
or more arguments and always result in a boolean value. For example:
(< (field 0) (field 1)) (<= (field 0 -1) (field 0) (field 0 1)) (> (field "date") "07-14-1969") (>= 23 (f "000004" -2))
The equality (
=) and inequality (
!=) operators can be applied to
operators of any kind:
(= "Dante" (field "Author")) (= 1300 (field "Year")) (= (field "Year" -2) (field "Year" -1) (field "Year")) (!= (field "00033" -1) (field "00033" 1))
Comparing numerical values can be tricky, especially when they’re the
result of mathematical operations, but Flatline makes an effort to be
sensible and considers things like 1 and 1.0 equal (for numeric values,
it actually uses Clojure’s
== operator); but of course it cannot fix
rounding errors or the like for you!
The basic logical connectives
not, acting on
boolean values, are available, with their usual meanings.
(and (= 3 (field 1)) (= "meh" (f "a")) (< (f "pregnancies") 5)) (not true)
For additional convenience,
or can be applied to lists
(and (list <sexp0> ... <sexpn>)) := (and <sexp0> ... <sexpn>) (or (list <sexp0> ... <sexpn>)) := (or <sexp0> ... <sexpn>)
The usual arithmetical operators
/ taking any number of arguments (or zero, for``+
are available. Of course their operands must evaluate to a numeric
value; otherwise, the result will be nil, representing a missing value.
When not coerced, the result of the
/ operator has type
If needed, you can transform it to an integer via the coercion function
integer or use instead the integer division operator
You can coerce arbitrary values to explicit numeric types. When the input sexp is a string (or a category name), we try to parse it as a number and afterwards perform a pure numerical coercion if needed. Boolean values are mapped to 0 (false) and 1 (true).
(integer <sexp>) (real <sexp>)
If the input value cannot be coerced to a number the result is a missing value.
We provide a host of mathematical functions:
(max <x0> ... <xn>) (min <x0> ... <xn>) (abs <x>) ;; Absolute value (mod <n> <m>) ;; Modulus (div <n> <m>) ;; Integer division (quotient) (sqrt <x>) (pow <x> <n>) (square <x>) ;; (* <x> <x>) (ln <x>) ;; Natural logarithm (log <x>) ;; Base-2 logarithm (log10 <x>) ;; Base-10 logarithm (exp <x>) ;; Exponential (ceil <x>) (floor <x>) (round <x>) (cos <x>) ;; <x> := radians (sin <x>) ;; <x> := radians (tan <x>) ;; <x> := radians (to-radians <x>) ;; <x> := degrees (to-degrees <x>) ;; <x> := radians (acos <x>) (asin <x>) (atan <x>) (cosh <x>) (sinh <x>) (tanh <x>)
As well as two primitives for generating random numbers:
(rand) ;; a random double in [0, 1) (rand-int <n>) ;; a random integer in [0, n) or (n, 0]
Currently there’s no way of specifying the seed used for random number generation, but it’s coming shortly to a selected data generation language very near to you.
It’s also possible to compute the slope, intercept and Pearson coeffient of the linear regression of a set of points given as a list of alternating x and y coordinates:
(linear-regression <x0> <y0> <x1> <y1> ... <xn> <yn>) => (<slope> <intercept> <pearson>) ;; 3 double values
(linear-regression 1 1 2 2 3 3 4 4) => (1.0 0 1.0) (linear-regression 2.0 3.1 2.3 3.3 24.3 45.2) => (1.89 -0.87 0.9999)
chi-square-p-value computes the p-value of a Chi-square
distribution with the given number of degrees of freedom and a given cut
(chi-square-p-value <d> <x>) ;; => <p-value>, with <d> integer <x> a number
Thus, the value
x passes the Chi-square test if the value returned
(chi-square-p-value d x) is less than or equal to
instance, the expression:
(<= (chi-square-p-value 2 (field "000000")) 0.05)
will compute a boolean that tells you whether the field “000000” passes a Chi-square test for two degrees of freedom with significance level 0.05.
Flatline provides some functions to work with fuzzy logic features, fields or values. Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1 inclusive. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false.
Check this link to know more about fuzy logic: - Wikipedia: Fuzzy logic
Triangular norms (t-norms) and conorms (t-cnorms are operations which generalize the logical conjunction and logical disjunction to fuzzy logic.
You can find more information about t-norms and t-conorms in the following links:
All the norms are computed from two numeric values that can be specified either by referencing a numeric input field (giving its name or id) or by any valid flatline numeric expression. For instance:
(tnorm-min "field21" "field4") (tnorm-min "000002" "000001") (tnorm-min 0.70 0.24) (tnorm-min "000002" 0.1) (tnorm-min 0.2 (field 1))
Numeric values used by these norms must be between 0 and 1. As they are fuzzy logic values it doesn’t make sense having values outside this range. If you pass a field to the norms with more than 80% of its values outside this range, an exception will be raised. When some sparse out-of-range values are found during calculations, the generated field will contain a missing value for this specific row.
If your input fields are out of range, consider normalizing or truncating your fields, before passing them to the fuzzy logic norms:
(max 0 (min 1 (field "000001"))) ;; Truncating field (normalize "000001") ;; Normalizing field
You could then write expressions like these:
(tnorm-min (normalize "000001") (normalize "000002")) (tnorm-min (max 0 (min 1 (field "000001"))) (max 0 (min 1 (field "000002"))))
As members of the family of fuzzy logics, t-norm fuzzy logics primarily aim at generalizing classical two-valued logic by admitting intermediary truth values between 1 (truth) and 0 (falsity) representing degrees of truth of propositions. In fuzzy logic, continuous t-norms are often found playing the role of conjunctive connectives. We provide the following basic t-norms. All of them need 2 parameters.
(tnorm-min <f1> <f2>) ;; Minimum t-norm. Also called the Gödel t-norm. (tnorm-product <f1> <f2>) ;; Product t-norm. The ordinary product of real numbers. (tnorm-lukasiewicz <f1> <f2>) ;; Łukasiewicz t-norm. (tnorm-drastic <f1> <f2>) ;; Drastic t-norm (tnorm-nilpotent-min <f1> <f2>) ;; Nilpotent minimum t-norm
T-conorms (also called S-norms) are dual to t-norms under the order-reversing operation which assigns 1 – x to x on [0, 1]. T-conorms are used to represent logical disjunction in fuzzy logic and union in fuzzy set theory. We provide the following basic t-conorms. All of them need 2 parameters.
(tconorm-max <f1> <f2>) ;; Maximum t-norm. Dual to the minimum t-norm, is the smallest t-conorm. (tconorm-probabilistic <f1> <f2>) ;; Probabilistic t-norm. It's dual to the product t-norm. (tconorm-bounded <f1> <f2>) ;; Bounded t-norm. It'ss dual to the Łukasiewicz t-norm. (tconorm-drastic <f1> <f2>) ;; Drastic t-conorm. It's dual to the drastic t-norm. (tconorm-nilpotent-max <f1> <f2>) ;; Nilpotent maximum t-conorm. It's dual to the nilpotent minumum. (tconorm-einstein-sum <f1> <f2>) ;; Einstein t-conorm. It's a dual to one of the Hamacher t-norms.
We provide the following parametrized t-norms. All of them need 3
parameters, the two fields were t-norms will be applied, and the
p that provides a way to vary the gain on the function
so that it can be very restrictive or very permissive. This parameter
must be a real number.
(tnorm-schweizer-sklar <f1> <f2>) ;; Parameter p in the range [-∞, ∞] (tnorm-hamacher <f1> <f2>) ;; Parameter p in the range [0, ∞] (tnorm-frank <f1> <f2>) ;; Parameter p in the range [0, ∞] (tnorm-yager <f1> <f2>) ;; Parameter p in the range [0, ∞] (tnorm-aczel-alsina <f1> <f2>) ;; Parameter p in the range [0, ∞] (tnorm-dombi <f1> <f2>) ;; Parameter p in the range [0, ∞] (tnorm-sugeno-weber <f1> <f2>) ;; Parameter p in the range [-1, ∞]
Dates and times¶
A numerical field can be interpreted as an epoch, that is, the number of milliseconds since 1970. Flatline provides the following functions to expand an epoch to its date-time components:
(epoch-year <n>) (epoch-month <n>) (epoch-week <n>) (epoch-day <n>) (epoch-weekday <n>) (epoch-hour <n>) (epoch-minute <n>) (epoch-second <n>) (epoch-millisecond <n>) (epoch-fields <n>) => (list (epoch-year <n>) (epoch-month <n>) (epoch-day <n>) (epoch-weekday <n>) (epoch-hour <n>) (epoch-minute <n>) (epoch-second <n>) (epoch-millisecond <n>)) <n> ::= numerical value
(epoch-fields (f "milliseconds")) (epoch-year (* 1000 (f "seconds")))
The epoch functions also accept negative integers, which represent dates prior to 1970.
The day of the week (given by
epoch-weekday) is a number from 1
(Monday) to 7 (Sunday).
The week within the year, given by
epoch-week, is a number between
1 and 52. Note that it is not included in the oputput of
Since epochs are just integers, date arithmetic can be performed at that level by simply using Flatline’s arithmetic operations.
As a convenience, if a field of type
datetime is used in an
arithmetic operation, it’s automatically converted to an epoch (i.e., an
integer value) for you. For instance, the two following expressions for
computing the number of seconds since 1970 are equivalent:
(/ (f "a-datetime-string") 1000) (/ (epoch (f "a-datetime-string")) 1000)
Conversely, string values representing dates can be transformed to a
numerical epoch by using the
epoch coercion function:
(epoch <str>) (epoch <str> <format>)
If you don’t specify a datetime format for parsing, we try a long list of available formats in sequence, which is less efficient than if you provide the format explicitly. Datetime format specifiers follow the well known *JodaTime* specification for datetime patterns.
(epoch-fields (epoch "1969-14-07T06:00:12")) => [1969 14 07 06 00 12 0] (epoch-hour (epoch "11~22~30" "hh~mm~ss")) => 11
The datetime formate pattern letters are:
Symbol Meaning Presentation Examples ------ ------- ------------ ------- G era text AD C century of era (>=0) number 20 Y year of era (>=0) year 1996 x weekyear year 1996 w week of weekyear number 27 e day of week number 2 E day of week text Tuesday; Tue y year year 1996 D day of year number 189 M month of year month July; Jul; 07 d day of month number 10 a halfday of day text PM K hour of halfday (0~11) number 0 h clockhour of halfday (1~12) number 12 H hour of day (0~23) number 0 k clockhour of day (1~24) number 24 m minute of hour number 30 s second of minute number 55 S fraction of second number 978 z time zone text Pacific Standard Time; PST Z time zone offset/id zone -0800; -08:00; America/Los_Angeles ' escape for text delimiter '' single quote literal '
The count of pattern letters determine the format, according to the following rules:
- Text: If the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. Thus, “EEEE” might output “Monday” whereas “E” might output “Mon” (the short form of Monday).
- Number: The minimum number of digits. Shorter numbers are zero-padded to this amount. Thus, “HH” might output “09” whereas “H” might output “9” (for the hour-of-day of 9 in the morning).
- Year: Numeric presentation for year and weekyear fields are handled
specially. For example, if the count of
yis 2, the year will be displayed as the zero-based year of the century, which is two digits.
- Month: 3 or over, use text, otherwise use number. Thus, “MM” might output “03” whereas “MMM” might output “Mar” (the short form of March) and “MMMM” might output “March”.
Zoutputs offset without a colon,
ZZoutputs the offset with a colon,
ZZZor more outputs the zone id.
- Zone names: Time zone names (
z) cannot be parsed.
Any characters in the pattern that are not in the ranges of
['A'..'Z'] will be treated as quoted text. For
instance, characters like
appear in the resulting time text even they are not embraced within
You can define lexically scoped variables using the
(let <bindings> <body>) <bindings> := (<varname0> <val0> ... <varnamen> <valn>) <body> := <expression with varname0 ... varnamen>
The binding values are evaluated sequentially and can then be referenced in the body of the let expression by their names:
(let (x (+ (window "a" -10 10)) a (/ (* x 3) 4.34) y (if (< a 10) "Good" "Bad")) (list x (str (f 10) "-" y) a y))
As shown in the example above, value expressions can use any identifier previously defined in the same list:
(let (x 43 x (+ x 1) y (+ x 1)) (list x y)) => (44 45)
Finally, let expressions can nested and they can appear wherever a Flatline expression is valid:
(list (let (z (f 0)) (* 2 (* z z) (log z))) (let (pi 3.141592653589793 r (f "radius")) (* 4 pi r r)))
if operator can be applied to a boolean conditional to yield one
of a couple of values, with the “else” clause being optional:
(if <cond> <then> [<else>]) <cond> := boolean value
You can use arbitrary expressions for
<else>, with the only restriction that
<cond> must be a boolean
value. If not provided,
<else> defaults to a “nil” value that
denotes a missing token.
(if (< (field "age") 18) "non-adult" "adult") (if (= "oh" (field "000000")) "OH") (if (> (field "000001") (mean "000001")) "above average" (if (< (field "000001") (mean "000001")) "below average" "mediocre"))
Flatline won’t let you give
<else> different types.
Another caveat is that in Flatline boolean expressions can have 3
false and missing. If the
if expression is a missing value, the whole expression will
evaluate to a missing value. That means that, for instance:
(if (< (f 0) 3) 0 1)
will evaluate to null (and not to 1) when the field 0 has a missing
value. That’s because the
<else> branch is not even evaluated.
(if (< (f 0) 3) 0 (if (missing? 0) 2 1))
will again evaluate to null when the field 0 is missing: it will not
evaluate to 2, because the
<else> branch is never reached. If you
need to test for a missing value, the test must always come first:
(if (missing? 0) 2 (if (< (f 0) 3) 0 1))
We also provide the
cond operator, which allows a more compact
representation of a chain of nested
(cond <cond0> <then0> <cond1> <then1> ... ... <default>) := (if <cond0> <then0> (if <cond1> <then1> ... <default>))
Conditions are checked in order, and the first one that matches provides
the value of the
cond expression. If none of the conditions is met,
the expression evaluates to
<default> or nil (missing token) if it’s
(cond (> (f "000001") (mean "000001")) "above average" (= (f "000001") (mean "000001")) "below average" "mediocre") (cond (or (= "a" (f 0)) (= "a+" (f 0))) 1 (or (= "b" (f 0)) (= "b+" (f 0))) 0 (or (= "c" (f 0)) (= "c+" (f 0))) -1) (cond (< (f "age") 2) "baby" (and (<= 2 (f "age") 10) (= "F" (f "sex"))) "girl" (and (<= 2 (f "age") 10) (= "M" (f "sex"))) "boy" (< 10 (f "age") 20) "teenager" "adult")
The same caveat with
if regarding missing values applies to
cond: if any of the conditions evaluates to a missing value, the
whole expression evaluates to a missing value. Therefore, checks using
missing? must always come first:
;;; CORRECT (cond (missing? "age") 0 (< (f "age") 10) 1 2) ;;; INCORRECT (the missing? test is never reached) (cond (< (f "age")) 1 (missing? "age") 0 2)
It’s possible to create a list of values using the
(list <sexp-0> ...)
with the values any valid Flatline expression, e.g.:
(list (field "age") (field "weight" -1) (population "age")) (list 1.23 (if (< (field "age") 10) "child" "adult") (field 3))
and we also provide the classical
cons to create a list from its
head and tail, which can in turn be accessed via
(cons <head> <tail>) <tail> := list (head <list>) (tail <list>)
(head (cons x lst)) ==> x (tail (cons x lst)) ==> lst
It is also possible to access the nth element of a list using its 0-based position index:
(nth <list> <pos>) <pos> := positive integer
When the given position is out of bounds, the expression evaluates to nil (a missing token).
There are operators to take and drop the first n elements of a list:
(drop <list> <n>) (take <list> <n>) <n> := positive integer
(take 3 (list 1 2 "3" 4)) ;; => (1 2 "3") (drop 3 (list 1 2 "3" 4)) ;; => (4)
Taking more elements than a list contains returns the full list, and droping more elements that the list length evaluates to a missing value.
It is also possible to take a slice in a list in a semi-open range
[from, to) with the function
slice, that can be defined in
(slice <lst> <from> <to>) := (take (- <to> <from>) (drop <from> <lst>))
Given a list value, you can count its elements, obtain their median, mode and, when its values are numeric, compute the maximum minimum and average:
(count <list>) ;; (count (list (f 1) (f 2))) => 2 (mode <list>) ;; (mode (list a b b c b a c c c)) => "c" (max <list>) ;; (max (list -1 2 -2 0.38)) => 2 (min <list>) ;; (min (list -1.3 2 1)) => -1.3 (avg <list>) ;; (avg (list -1 -2 1 2 0.8 -0.8)) => 0 (list-median <list>) ;; (list-median (list -1 -2 1 2 0.8 -0.8) => 1
And, as we have mentioned, the arithmetic operators
/ are, like
min, overloaded to distribute over
the elements of a numeric list:
(+ (list x y ...)) := (+ x y ...) (- (list x y ...)) := (- x y ...) (* (list x y ...)) := (* x y ...) (/ (list x y ...)) := (/ x y ...)
sort (in ascending lexicographical order)
(reverse <list>) (sort <list>)
(reverse (list "a" 0 2 "b")) => ("b" 2 0 "a") (sort (list 1 3 -1 2)) => (-1 2 1 3) (sort (list "a" "b" "aa")) => ("a" "aa" "b")
Finally, you can check whether a value appears in a list using the
(in <x> (<x0> <x1> ... <xn>))
which evaluates to
true if any of the
(in 3 (1 2 3 2)) => true (in "abc" (1 2 3)) => false (in (f "size") ("X" "XXL"))
Maps and filters¶
It’s also possible to apply an expression template (a Flatline
expression with one free variable, marked as
_) to each element of a
list, yielding a list of results, using the
(map <fn> (list <a0> <a1> ... <an>)) := (list (call <fn> <a0>) (call <fn> <a1>) ... (call <fn> <an>)) <fn> := expression template
An expression template is any valid Flatline expression that uses
as a placeholder:
(< _ 3) (+ (f "000001" _) 3) (< -18 _ (f 3))
and when you
call a template with an argument, a new expression is
generated by the simple device of substituting the argument for
the template. For instance:
(map (* 2 _) (list (f 0 -1) (f 0) (f 0 1)))
((* 2 (f 0 1)) (* 2 (f 0)) (* 2 (f 0 1)))
A second common list transformation is
filter, which allows you to
apply a predicate to each element of a list and retain only those values
that satisfy it:
(filter <fn> (list <a0> ... <an>)) := [ai | (call <fn> <ai>) is true]
(+ (filter (< _ 3) (fields "a" "b" "c")))
will add those values of the fields with names a, b and c whose values are less than three.
Currently, maps and filter are implemented as macro expansions (for
simplicity, and also for performance) and their second argument must
therefore be a
window (see below) form. If
needed, future versions of Flatline will provide slow real functions.
Field lists and windows¶
(Almost) all fields¶
We provide several primitives for creating lists of field values. The
first one is
all, which specifies that all input fields should be
copied, without any modification. For cases where you want to copy all
but a few fields, there’s
all-but, which takes as argument
designators of those fields not to include in the list:
(all) := (list (f 0) ... (f <field-count>)) (all-but <fd0> ... <fdn>) := (list (f i0) ... (f in)) | i0...in not in fd0...fdn
fields, which lets you select a list of fields from
the current input row:
(fields <field-designator> ... <field-designator-n>) := (list (f <field-designator>) .. <field-designator-n>)
fields, fields can be designated, as usual,
with either their identifier, name or
(all-but "id" "000023") (fields "000003" 3 "a field" "another" "0002a3b-3")
Sometimes one needs to fill-in missing values in one pass: an easy way
for that is provided by the function
all-with-defaults, that copies
all input rows, but replacing missing values with given ones:
(all-with-defaults <field-designator-0> <field-value-0> <field-designator-1> <field-value-1> ... <field-designator-n> <field-value-n>)
The list of designator/value pairs does not need to be exhaustive or ordered, and again the designator can be a field id, name, or column number:
(all-with-defaults "species" "Iris-versicolor" "petal-width" 2.8 "000002" 0)
It is also possible to provide a default for all missing numeric fields
in a row at once, using
(all-with-numeric-default <value>) <value> := "mean" | "median" | "minimum" | "maximum" | <number>
As shown, we can specify that missing numeric fields be filled with their mean, median, minimum or maximum values (as read from their respective field descriptors) or with any concrete numeric value. For example:
(all-with-numeric-default "median") (all-with-numeric-default 0)
A word of caution: for the case of concrete values, the given number is
cast to the datatype of the target field, i.e., it’ll be mapped to value
range of the given field (for instance, if you give a default value of
128 and a field of type
int8 is missing, it’ll receive the value
In addition to horizontally selecting different fields in the same row,
we can keep the field fixed and select vertical windows of its value,
window and related operators. They’re just syntactic sugar
over the shifted field accessors we’ve already seen:
(window <field-designator> <start> <end>) := (list (f <field-designator> <start>) (f <field-designator> <start + 1>) ... (f <field-designator> <end>))
So, for instance, the window:
(window "000001" -1 2)
denotes the list of values:
(list (f "000001" -1) (f "000001" 0) (f "000001" 1) (f "000001" 2))
As shown, both start and end must be integers, and the values corresponding to their shifts are included in the resulting list.
It’s possible to apply arithmetic operators,
any window. For instance, you could compute the average of the last 3
values of a field as:
(/ (+ (window "Temp" -2 0) 3))
Or convert all the values to Fahrenheit degrees and select those below 99.9 with:
(filter (< _ 99.9) (map (+ 32 (* 1.8 _)) (window "Temp" -2 0)))
In addition to the plain
window generator, we provide some other
convenience window primitives computing, respectively, the average
value, median and of the values in a window, their sum and the
sequence of their differences:
(window-median <field-designator> <start> <end>) := (list-median (window <field-designator> <start> <end>)) (window-mean <field-designator> <start> <end>) := (avg (window <field-designator> <start> <end>)) (window-mode <field-designator> <start> <end>) := (mode (window <field-designator> <start> <end>)) (window-sum <field-designator> <start> <end>) := (+ (window <field-designator> <start> <end>)) (diff-window <fdes> <start> <end>) := (list (- (f <fdes> <start>) (f <fdes> (- <start> 1))) (- (f <fdes> (- <start> 1)) (f <fdes> (- <start> 2))) ... (- (f <fdes> (- <end> 1)) (f <fdes> <end>)))
These window generator forms can also be combined with
map and all the other window operators.
Conditional window limits¶
There are scenarios in which you might be interested in forming a window
whose width depends on some condition. For instance, say you want to
compute the average of a temperature for the last four minutes in a
dataset with aperiodic entries:
cond-window to the rescue:
(let (now (f "epoch")) (avg (cond-window "temperature" (< (- (f "epoch") now) 240))))
As you see in this example,
cond-window takes a field designator and
a predicate; the latter is applied sequentially to the current and
future rows (up to a standard maximum value), and a list of the values
of the requested fields for the rows satisfying the predicate is
(cond-window <fdes> <sexp>) := (list (f <fdesc> 0) ... (f <fdesc> n)) | for [0..n] (<sexp>)
Note that, as mentioned,
<sexp> is a Flatline expression computed
with the corresponding (future) full row as input.