
Quick start guide
Martin Westgate
2024-11-06
Source:vignettes/quick_start_guide.Rmd
quick_start_guide.Rmd
Ecological Metadata Language (EML) is a “comprehensive vocabulary and
a readable XML markup syntax for documenting research data” (Jones et al. 2019).
Supplying metadata in this format is a requirement of several
data-sharing institutions, notably including the Global Biodiversity
Information Facility (GBIF) and its
partner nodes. While the EML package already
exists to support building and manipulating EML within the workspace,
there is no comparable system for writing these documents in a more
human-readable format, such as Rmarkdown or Quarto. delma
provides this capability.
The default method for using delma
is to first generate
a boilerplate metadata statement; edit it in your chosen IDE; then
render it to EML:
# create a boilerplate statement
use_metadata("my_metadata_statement.Rmd")
# edit this document in IDE before calling:
render_metadata(input = "my_metadata_statement.Rmd",
output_file = "metadata.xml")
EML is quite stringent in the types of data that are allowed, as well
as their order and placement in the hierarchy. Getting these points
right can be challenging, and we suggest using
check_metadata()
in combination with the EML schema documentation to
get it right.
Formatting your metadata statement
There are several points that are unusual about Rmarkdown documents
formatted using delma
, which we discuss below
Document structure
Header levels in markdown-formatted metadata statements determine the nested structure of the resulting xml. For example, the markdown file might contain this text:
# EML
## Dataset
### Title
My title
Which would parse to this in xml:
<EML>
<dataset>
<title>My title</title>
</dataset>
<EML>
Attributes
Attributes can be added to a particular EML tag by including them in
a list
within a code block, the label
of which
is used by delma
to link tags to their attributes. To add
attributes to the userId
field, for example, you would add
the following code under the ## userId
heading:
The include: false
tag is added so that this content
isn’t displayed when the document is knit.
Setting a unique ID
Every EML document must open with the tag eml
, and the
attributes of that element must contain a unique
identifier in the packageId
field, as well as a link to the
system within which that key is unique. A logical example might be a
DOI:
```{r}
#| label: 'eml'
#| include: false
list(packageId = "https://doi.org/10.32614/CRAN.package.galah",
system = "https://doi.org")
```
A valid alternative might be a GitHub release:
```{r}
#| label: 'eml'
#| include: false
list(packageId = "https://github.com/AtlasOfLivingAustralia/galah-R/releases/download/v2.1.0/galah_2.1.0.tar.gz",
system = "https://github.com")
```
Note that the eml
tag is unusual in delma
in that it is added automatically if not supplied. Where this occurs,
all tag levels are also incremented by one to account for this
change.
Dynamic content
delma
will call rmarkdown::render()
internally whenever read_md()
or
render_metadata()
is used, meaning that it is possible to
add dynamic content to your metadata statements. The boilerplate
statement that ships with delma uses this feature to automatically
populate the Title
and Pubdate
fields from the
YAML
section, for example:
```{r}
#| echo: false
#| results: 'asis'
# NOTE: This is set using the yaml above; do not edit by hand
cat(rmarkdown::metadata$date)
```
You could also implement dynamic content using inline code:
Reading, writing, and format conversion
Internally, delma
uses the lightparser
package to convert markdown files to tibble
s, and the xml2 package to
convert list
s to xml and write them to file. Between these
two packages, we have written functions to convert between tibble and
list versions of EML-formatted markdown.
Under the hood, render_metadata()
calls
read_md()
, which does a few things:
- calls
rmarkdown::render()
on your chosen file, meaning any code blocks or inline code is executed properly. - appends code blocks whose
label
matches an EML tag to theattributes
of that EML tag; this allows quite complex attribute addition without affecting rendered text. - ‘cleans’ the imported tibble to a small number of required columns.
This tibble is then rendered to EML using write_eml()
.
The inverse operation is accomplished by calling read_eml()
followed by write_md()
.