CADSbib. An annotated bibliography for corpus and discourse research
CADSbib is an annotated bibliography comprising corpus-based studies on discourse analysis. The initial version was collected between 2021-2025 and contains over 500 entries.
In contrast to general bibliographic databases, CADSbib is manually curated for research relevant to corpus work on discourse. This is particularly useful considering the broad usage of the term discourse, which can make it difficult to find relevant work.
The main contribution of CADSbib is that it contains detailed information on each reference including metadata, software, target language and corpus methods. This opens up usage applications across a wide range of areas.
CADSbib supports trend analysis (e.g., tracking the rise of keyness methods), geographic filtering (e.g., finding work from underrepresented regions), and teaching (e.g., building method-specific reading lists).
This bibliography is the basis for the survey on the development of CADS that was conducted as part of my PhD work. A publication using an updated version is currently in preparation.
Download CADSbib
Find the full csv table on OSF.
To access the references in a citation-friendly manner, I also provide a bib file on OSF. Please note: the bib file was auto-generated from free-text input and contains some errors. If you see something that needs correcting, please let me know.###
Structure
The references are annotated with the following types of information:
Metadata
| Category | Explanation |
|---|---|
| year | year of publication |
| author_1, author_2, … | individual author names |
| citation | remaining citation information: title, venue, page numbers etc. |
| affiliation_Author_1 | institution where the first author is based |
| affiliation_Country | country where the first author is based |
| affiliation_author_1_continent | continent where the first author is based |
Corpus and tools
| Category | Explanation |
|---|---|
| corpus_target | Corpus name for general-purpose corpora. custom for corpora created by the researcher(s) themselves |
| domain | Domain(s) that the corpus texts are from (e.g. media, politics) |
| Language_target | language(s) of the corpus; separated by commas |
| Software | corpus software used in the study. custom if the authors used programming scripts, with exceptions for widespread libraries |
| Software_2 | additional software; separated by commas |
Methods
| Category | Explanation |
|---|---|
| frequency_dispersion | y if the study uses frequency-based methods that are not covered by any other columns |
| keywords | y if the study uses keyness analysis (including other levels than the word). |
| collocations | y if the study uses collocations. |
| n-grams | y if the study uses n-grams. |
| KWIC | y if the study uses concordance analysis and/or shows concordance examples. |
| semtags | y if the study uses semantic tagging |
| corpus_methods_other | free-text form for studies using other corpus methods. Methods are separated by commas |
| cooc_measure | association statistic used in keyness or collocation analysis. Only specified for a sample of articles. |
| cooccurrence_aggregation | How the keywords or collocates were organised. Only specified for a sample of articles. |
Get involved
I aim to update CADSbib on a semi-regular basis. To make this possible, I will need support from the CADS community: Get in touch via Bluesky or email to contribute to CADSbib with references, annotations or corrections.
Enjoy Reading This Article?
Here are some more articles you might like to read next: