C1 Expression Very Formal 8 min read

The corpus consisted of

Research methodology and reporting expression

Literally: The body consisted of

In 15 Seconds

  • Formal way to describe a research dataset of texts.
  • Used in academic papers, theses, and technical reports.
  • Specifically refers to language data, not people or objects.
  • Always uses the preposition 'of' and usually past tense.

Meaning

This phrase is the academic way of saying 'Here is the specific pile of data I studied.' It acts as a boundary for a research project, telling your audience exactly which texts, recordings, or documents provided the evidence for your conclusions. It carries a heavy 'expert' vibe, suggesting that your collection was deliberate, organized, and scientifically sound.

Key Examples

3 of 10
1

Thesis defense

The corpus consisted of five hundred political speeches delivered between 2010 and 2020.

The corpus consisted of five hundred political speeches delivered between 2010 and 2020.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
2

Tech company report

Our training corpus consisted of anonymized user comments from the last six months.

Our training corpus consisted of anonymized user comments from the last six months.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
3

Linguistics lecture

The corpus consisted of transcribed conversations from local coffee shops.

The corpus consisted of transcribed conversations from local coffee shops.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
🌍

Cultural Background

In Western academia, 'delimitation' is a sign of honesty. By defining your corpus, you are admitting your research has limits, which is highly respected. Modern researchers often use 'born-digital' corpora, consisting of tweets or blog posts, which has changed the traditional view of what a 'body of work' looks like. In British legal history, the 'Corpus Juris' was the foundation of law. Using 'corpus' today still carries that weight of authority and tradition. The 'Brown Corpus' was the first major electronic corpus of American English. It set the standard for using the phrase 'the corpus consisted of' in linguistic papers.

🎯

The 'Of' Rule

Always double-check your preposition. 'Consisted OF' is for a list of items. If you use 'IN', you are talking about an abstract definition.

⚠️

Avoid Passive Voice

Never say 'The corpus was consisted of.' It is always 'The corpus consisted of.' This is a common error for learners of all levels.

In 15 Seconds

  • Formal way to describe a research dataset of texts.
  • Used in academic papers, theses, and technical reports.
  • Specifically refers to language data, not people or objects.
  • Always uses the preposition 'of' and usually past tense.

What It Means

Imagine you are trying to prove that people on TikTok use the word slay differently than people on Instagram. You can’t just say 'I saw some videos.' To be a real researcher, you need a 'bucket' of data. That bucket is your corpus. When you say the corpus consisted of, you are formally introducing that bucket to your audience. It is like an artist showing you their palette before they start painting; it defines the limits of what is possible in the study. This phrase is the gold standard for transparency in linguistics, data science, and literature reviews. It tells the reader, 'I didn't just cherry-pick examples; I looked at this specific, finite collection.'

What It Means

At its heart, the corpus consisted of is a delimiting expression. The word corpus comes from the Latin word for 'body.' So, you are essentially describing the 'body' of your work. It implies a sense of completeness and intentionality. If you say 'I read some books,' it sounds like a hobby. If you say the corpus consisted of twenty-four 19th-century novels, you sound like someone who is about to get a PhD. It carries a vibe of 'controlled observation.' You aren't just looking at the world at large; you've built a small, digital walls around a specific set of information so you can analyze it without getting distracted by the rest of the internet. It is the verbal equivalent of putting on a lab coat before you start typing.

How To Use It

You will almost always see this followed by a number and a description. The structure is usually: The corpus consisted of + [Quantity] + [Type of Material] + [Source/Timeframe]. For example, The corpus consisted of 500 emails sent within a corporate environment between 2010 and 2015. Notice how specific that is? You can't be vague here. If you are vague, the phrase loses its power. You should use it in the 'Methodology' section of a paper or the 'Data' section of a technical report. It is a past-tense phrase because, by the time you are writing about it, the collection process is usually finished. You are looking back at the 'body' you built. Pro tip: treat it like a recipe list—clear, concise, and measurable. Just don't try to use it to describe your laundry pile, unless you're doing a very weird sociological study on socks.

Formality & Register

This phrase is dressed in a three-piece suit. It is high-level academic English (C1/C2). You will find it in peer-reviewed journals, university lectures, and high-end data science whitepapers. You will almost never hear it in a coffee shop unless two linguistics professors are arguing about their latest research. It sits at the peak of the formality mountain. Because it is so formal, using it in a casual text message would make you sound like a robot or a very confused time traveler. However, in the world of AI and Big Data, this register is becoming more common in professional tech settings. When developers talk about training a New Large Language Model, they use this language to explain what data the AI 'ate' during its training phase.

Real-Life Examples

You’ll spot this in the wild on sites like JSTOR or Google Scholar. A linguist might write: The corpus consisted of transcribed interviews from suburban teenagers in London. A historian might say: The corpus consisted of every surviving letter written by Civil War soldiers from Virginia. In the tech world, a blog post might explain: To train our sentiment analysis tool, the corpus consisted of 10 million one-star Yelp reviews. Even in high-end journalism, like a deep-dive investigative piece in The New York Times, you might see it used to describe a massive leak of documents: The corpus consisted of over 2.5 million encrypted files. It’s the phrase people use when the 'pile of stuff' is too big or too important to just call it 'some papers.'

When To Use It

Use this when you are presenting the results of an investigation where you analyzed a specific set of texts or data. It is perfect for a thesis, a capstone project, or a formal business proposal that involves market research. If you’ve spent weeks scraping data from Reddit to see how people talk about cryptocurrency, this is your phrase. It’s also great for literary analysis—if you’re comparing every poem Emily Dickinson ever wrote, the corpus is the correct term for her collected works. It tells your professor or your boss that you have a methodology. It transforms 'I looked at stuff' into 'I conducted an analysis on a curated dataset.' It’s the ultimate 'trust me, I’m an expert' signal.

When NOT To Use It

Do not use this for people. If you interviewed 10 people, you have a sample, not a corpus. A corpus is specifically for 'texts' (which can include recorded speech, but the focus is on the language data). Don't use it for physical objects either—you wouldn't say the corpus consisted of twelve types of rock. That’s a collection or a set. Also, avoid it in casual settings. If your friend asks what you’ve been reading lately, saying the corpus consisted of three mystery novels and a cookbook will definitely result in some weird looks and possibly a decrease in social invitations. It’s too heavy for everyday life. It’s like using a microscope to look at a slice of pizza—technically possible, but totally unnecessary.

Common Mistakes

The biggest trap is the preposition. People often try to say consisted from or consisted in. Neither is correct in this context. It is always consisted of. Another mistake is using the word corpora (the plural) when you only have one collection. Stick to the corpus for a single set. Some people also confuse consisted of with comprised. While similar, comprised doesn't need the of. So, ✗ The corpus comprised of... is a very common error. You should say ✓ The corpus comprised... or ✓ The corpus consisted of... Lastly, make sure your 'consisted' is in the past tense if the study is done. ✗ The corpus consists of is okay if you are currently building it, but 90% of the time, you want the past tense.

Common Variations

If the corpus consisted of feels a bit too stiff, you have options. The dataset included is very common in data science and feels slightly more modern. The sample was composed of is better if you are talking about a mix of things. For a very high-level academic feel, you might use The archival material comprised. If you want to sound more active, you could say We analyzed a collection of... In more 'tech-bro' environments, you’ll often hear The training set was made up of. However, none of these quite capture the specific 'linguistic/textual' weight of corpus. If you are talking about words and language, corpus remains the king of the hill. It’s the OG term for a big pile of words.

Real Conversations

P

Professor

So, tell me about the scope of your linguistic study on 1920s jazz lyrics.
S

Student

Well, the corpus consisted of over 400 recorded songs and their published sheet music.
P

Professor

Excellent. And did you include regional variations?
S

Student

Yes, the corpus consisted of recordings from both New Orleans and Chicago labels.

Data Scientist: How did we train the new chatbot to handle customer complaints?

Lead Dev: The corpus consisted of three years of anonymized chat logs from the support portal.

Data Scientist: Was it enough data?

Lead Dev: Since the corpus consisted of nearly a million entries, the accuracy is quite high.

Quick FAQ

Is a corpus just a fancy word for a library? Not quite. A library is a place; a corpus is a specific selection of texts used for a single purpose or study. Can a corpus be made of videos? Yes, but usually, it refers to the transcripts or the audio data from those videos. Is corpora the plural? Yes, it is! If you are comparing two different sets of data, you are looking at corpora. Is it only for English? No way! You can have a corpus of any language, from Ancient Greek to Klingon. Does it have to be digital? Traditionally, no—corpora used to be piles of paper. But nowadays, 99.9% of the time, a corpus is a digital database. It’s much easier to search for 'slay' with a computer than with a highlighter.

Usage Notes

This phrase is strictly for academic, technical, or high-level professional writing. Its biggest 'gotcha' is using it for people (it's for texts only!) or using the wrong preposition. Always stick to 'of' and keep it in the past tense for completed research.

🎯

The 'Of' Rule

Always double-check your preposition. 'Consisted OF' is for a list of items. If you use 'IN', you are talking about an abstract definition.

⚠️

Avoid Passive Voice

Never say 'The corpus was consisted of.' It is always 'The corpus consisted of.' This is a common error for learners of all levels.

💬

Academic Branding

Using this phrase in a job interview for a data-related role can make you sound very professional and well-educated.

Examples

10
#1 Thesis defense
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The corpus consisted of five hundred political speeches delivered between 2010 and 2020.

The corpus consisted of five hundred political speeches delivered between 2010 and 2020.

Defining the scope of academic research.

#2 Tech company report
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

Our training corpus consisted of anonymized user comments from the last six months.

Our training corpus consisted of anonymized user comments from the last six months.

Used in AI development contexts.

#3 Linguistics lecture
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The corpus consisted of transcribed conversations from local coffee shops.

The corpus consisted of transcribed conversations from local coffee shops.

Specifying the source of spoken language data.

Academic writing mistake Common Mistake
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

✗ The corpus consisted from 100 books. → ✓ The corpus consisted of 100 books.

✗ The corpus consisted from 100 books. → ✓ The corpus consisted of 100 books.

Common preposition error.

#5 Social media data science post
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

For this sentiment analysis, the corpus consisted of 50,000 viral tweets.

For this sentiment analysis, the corpus consisted of 50,000 viral tweets.

Modern application of research terminology.

#6 Literature review
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The corpus consisted of every editorial published by the newspaper in 1945.

The corpus consisted of every editorial published by the newspaper in 1945.

Defining a historical text collection.

#7 Internal team meeting
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The initial corpus consisted of mostly Wikipedia articles, but we added Reddit later.

The initial corpus consisted of mostly Wikipedia articles, but we added Reddit later.

Describing the growth of a dataset.

#8 Humorous academic observation
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M14.828 14.828a4 4 0 01-5.656 0M9 10h.01M15 10h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/></svg>

During finals week, the student's corpus consisted of three energy drink cans and a half-finished essay.

During finals week, the student's corpus consisted of three energy drink cans and a half-finished essay.

Playful use of a very formal term in a messy situation.

Mistake in word choice Common Mistake
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

✗ The corpus consisted of 200 participants. → ✓ The sample consisted of 200 participants.

✗ The corpus consisted of 200 participants. → ✓ The sample consisted of 200 participants.

A corpus is for texts, a sample is for people.

#10 Discussing a failed project
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4.318 6.318a4.5 4.5 0 000 6.364L12 20.364l7.682-7.682a4.5 4.5 0 00-6.364-6.364L12 7.636l-1.318-1.318a4.5 4.5 0 00-6.364 0z"/></svg>

Sadly, the corpus consisted of too much biased data to be useful.

Sadly, the corpus consisted of too much biased data to be useful.

Expressing disappointment in research quality.

Test Yourself

Complete the sentence with the correct preposition.

The research corpus consisted ____ 500 hours of audio recordings.

✓ Correct! ✗ Not quite. Correct answer: of

'Consist of' is the standard phrase for describing the components of a corpus.

Which sentence is appropriate for a formal academic paper?

Choose the best option:

✓ Correct! ✗ Not quite. Correct answer: The corpus consisted of 19th-century personal correspondence.

This uses the correct register, grammar, and preposition.

Match the term to its definition.

Match the following:

✓ Correct! ✗ Not quite. Correct answer: Corpus: A systematic collection of texts; Consisted of: Made up of specific parts; Consisted in: To have as an essential feature; Tokens: Individual words in a corpus.

Understanding these distinctions is key for C1 mastery.

Complete the dialogue with the most professional phrase.

Professor: 'How did you select your data?' Student: 'Well, _________ every article published in the journal last year.'

✓ Correct! ✗ Not quite. Correct answer: the corpus consisted of

This is the most professional way to answer in an academic setting.

🎉 Score: /4

Visual Learning Aids

Consist of vs. Consist in

Consist of (Parts)
Books The corpus consisted of books.
Data The set consisted of data.
Consist in (Essence)
Virtue Wisdom consists in virtue.
Truth Beauty consists in truth.

Practice Bank

4 exercises
Complete the sentence with the correct preposition. Fill Blank B2

The research corpus consisted ____ 500 hours of audio recordings.

✓ Correct! ✗ Not quite. Correct answer: of

'Consist of' is the standard phrase for describing the components of a corpus.

Which sentence is appropriate for a formal academic paper? Choose C1

Choose the best option:

✓ Correct! ✗ Not quite. Correct answer: The corpus consisted of 19th-century personal correspondence.

This uses the correct register, grammar, and preposition.

Match the term to its definition. Match C1

Match each item on the left with its pair on the right:

✓ Correct! ✗ Not quite. Correct answer: Corpus: A systematic collection of texts; Consisted of: Made up of specific parts; Consisted in: To have as an essential feature; Tokens: Individual words in a corpus.

Understanding these distinctions is key for C1 mastery.

Complete the dialogue with the most professional phrase. dialogue_completion B2

Professor: 'How did you select your data?' Student: 'Well, _________ every article published in the journal last year.'

✓ Correct! ✗ Not quite. Correct answer: the corpus consisted of

This is the most professional way to answer in an academic setting.

🎉 Score: /4

Frequently Asked Questions

10 questions

Technically, no. A corpus is a collection of *texts* or *data*. If you are talking about people, use 'The sample consisted of' or 'The group was composed of.'

Yes, if the corpus still exists and is available for others to use right now. Use the past tense 'consisted' if you are describing what you did in a past study.

The plural is 'corpora.' For example: 'Both corpora consisted of newspaper articles.'

Yes, unless it's a very technical or academic blog. For a general audience, 'The data I used included...' is much better.

'Consisted of' is slightly more traditional and always takes 'of.' 'Comprised' is more modern and does *not* take 'of' in the active voice.

Yes, in modern research (like computer vision), a corpus can consist of images or videos, though 'dataset' is more common in those fields.

'Consisted of' is more precise. It tells the reader exactly what the components were, whereas 'was' is too general.

Yes, it is standard in both British and American academic English.

Only if you are presenting formal research or data analysis. In a regular meeting, it might sound too 'stuffy.'

Usually 'consisted of' because you are describing the research you have already performed.

Related Phrases

🔄

The dataset comprised

synonym

The collection of data was made up of...

🔗

The sample was composed of

similar

The group of participants or items was made of...

🔗

Drawn from

builds on

Taken from a specific source.

🔗

Consisted in

contrast

To have as an essential feature.

Was this helpful?

Comments (0)

Login to Comment
No comments yet. Be the first to share your thoughts!