CADSbib. An annotated bibliography for corpus and discourse research

Library Header
Photo by Mohamad Zaheri on Unsplash

CADSbib is an annotated bibliography comprising corpus-based studies on discourse analysis. The initial version was collected between 2021-2025 and contains over 500 entries.

In contrast to general bibliographic databases, CADSbib is manually curated for research relevant to corpus work on discourse. This is particularly useful considering the broad usage of the term discourse, which can make it difficult to find relevant work.

The main contribution of CADSbib is that it contains detailed information on each reference including metadata, software, target language and corpus methods. This opens up usage applications across a wide range of areas.

CADSbib supports trend analysis (e.g., tracking the rise of keyness methods), geographic filtering (e.g., finding work from underrepresented regions), and teaching (e.g., building method-specific reading lists).

This bibliography is the basis for the survey on the development of CADS that was conducted as part of my PhD work. A publication using an updated version is currently in preparation.

Download CADSbib

Find the full csv table on OSF.

To access the references in a citation-friendly manner, I also provide a bib file on OSF. Please note: the bib file was auto-generated from free-text input and contains some errors. If you see something that needs correcting, please let me know.###

Structure

The references are annotated with the following types of information:

Metadata

Category Explanation
year year of publication
author_1, author_2, … individual author names
citation remaining citation information: title, venue, page numbers etc.
affiliation_Author_1 institution where the first author is based
affiliation_Country country where the first author is based
affiliation_author_1_continent continent where the first author is based

Corpus and tools

Category Explanation
corpus_target Corpus name for general-purpose corpora. custom for corpora created by the researcher(s) themselves
domain Domain(s) that the corpus texts are from (e.g. media, politics)
Language_target language(s) of the corpus; separated by commas
Software corpus software used in the study. custom if the authors used programming scripts, with exceptions for widespread libraries
Software_2 additional software; separated by commas

Methods

Category Explanation
frequency_dispersion y if the study uses frequency-based methods that are not covered by any other columns
keywords y if the study uses keyness analysis (including other levels than the word).
collocations y if the study uses collocations.
n-grams y if the study uses n-grams.
KWIC y if the study uses concordance analysis and/or shows concordance examples.
semtags y if the study uses semantic tagging
corpus_methods_other free-text form for studies using other corpus methods. Methods are separated by commas
cooc_measure association statistic used in keyness or collocation analysis. Only specified for a sample of articles.
cooccurrence_aggregation How the keywords or collocates were organised. Only specified for a sample of articles.

Get involved

I aim to update CADSbib on a semi-regular basis. To make this possible, I will need support from the CADS community: Get in touch via Bluesky or email to contribute to CADSbib with references, annotations or corrections.




    Enjoy Reading This Article?

    Here are some more articles you might like to read next:

  • test post2
  • test post