Split a plain text file into toots
split_to_toots.Rd
This function takes a plain text file (such as a Quarto blog post) and splits it into toots, does some cleaning, and returns an object with a data frame and some intermediate products.
Usage
split_to_toots(
x,
fragmentsToSkip = getOption("quartodon_fragmentsToSkip", 1),
tootSeparator = getOption("quartodon_tootSeparator", "^-----\\s*$"),
preprocess = getOption("quartodon_preprocess", list(c("^#.*", ""), c("`", ""))),
imgRegex = getOption("quartodon_imgRegex",
"^!\\[([^\\]]*)\\]\\(([^\\)]*)\\)\\{?([^}]*)\\}?$"),
imgAltRegex = getOption("quartodon_imgAltRegex", "fig-alt=\"([^\"]*)\""),
urlRegex = getOption("quartodon_urlRegex",
"(?!\\!)\\[([^\\]]*)\\]\\(([^\\)]*)\\)\\{?([^}]*)\\}?"),
cleanWhitespace = getOption("quartodon_urlRegex", TRUE)
)
Arguments
- x
The plain text file as a character vector.
- fragmentsToSkip
The number of fragments to skip when reading the text file (Quarto post, R Markdown file, etc). By default, the first fragment (i.e. the lines preceding the toot separator specified in
tootSeparator
, by default the first five dashes,-----
) will be skipped.- tootSeparator
The separator that is used to split the file into toots: matched against every line (i.e. element of the character vector).
- preprocess
A list of 2-element vectors specifying the preprocessing to perform on each extracted toot. These two argument are the first two arguments to a call to
gsub()
, with the toot as the third argument, andperl = TRUE
.- imgRegex
The regular expression used to find images. It should have one capturing group that extracts the path to the image.
- imgAltRegex
The regular expression used to find the images' alt text; it should have one capturing group that extracts the alt text.
- urlRegex
The regular expression used to find hyperlinks. It should have one capturing group that extracts the title (not the URL).
- cleanWhitespace
Whether to clean white space. If
TRUE
, all newline characters (\n
) are stripped from the beginning and end of each toot, and all sequences of more than two newline characters are replaced with exactly two newline characters.
Examples
### Get example post directory
examplePostDir <-
system.file("example-post",
package = "quartodon");
### Get an example text (see the intro vignette)
exampleText <-
readLines(
file.path(examplePostDir, "quartodon.Rmd"),
encoding = "UTF-8"
);
### Extract the toots
extractedToots <- split_to_toots(
exampleText
);
### Look at the text of the third extracted toot:
cat(extractedToots$df$toots[1]);
#> This thread explains the {quartodon} R 📦 (see https://quartodon.opens.science).
#>
#> The #rstats quartodon 📦 allows you to post a Mastodon thread from a plain text file (e.g., a blog post from a Quarto, {blogdown}, or {distill} website, another Quarto or R Markdown file, or just a plain text file).
#>
#> This effectively allows you to post blog posts to Mastodon in a thread of toots 📑➡️🪄➡️🐘🐘🐘