C1 Expression Very Formal 7 min read

The dataset comprised

Research methodology and reporting expression

Literally: The data set was made up of

In 15 Seconds

  • Formal way to list components of a research study or data pool.
  • Implies the list provided is complete and all-inclusive.
  • Used in reports, academic papers, and technical presentations.
  • Signals high-level English proficiency and professional authority.

Meaning

This phrase is a high-level way to describe the specific ingredients of a large collection of information. It essentially says 'the data set was made up of these specific things' and implies that the list provided is the complete and total count of what was studied.

Key Examples

3 of 10
1

Writing a lab report

The dataset comprised results from 50 separate chemical reactions.

The data set consisted of results from 50 separate chemical reactions.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
2

AI research blog post

Our training dataset comprised text from Reddit, Wikipedia, and public news archives.

Our training data set included text from Reddit, Wikipedia, and public news archives.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
3

Job interview on Zoom

In my last project, the dataset comprised over 10,000 customer feedback forms.

In my last project, the data set was made up of over 10,000 customer feedback forms.

<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>
🌍

Cultural Background

Precision in language is seen as a proxy for precision in research. Using 'comprised' correctly signals that the researcher is detail-oriented and respects traditional scholarly standards. In legal English, 'comprise' is used to define the exact limits of a property or a body of evidence. Misusing it can lead to ambiguity about whether a list is exhaustive or merely illustrative. With the rise of Big Data, 'dataset' has become a buzzword. Using 'comprised' in a presentation adds a layer of 'data maturity' and professionalism to the speaker's persona. While both use the phrase, British academic style guides (like Oxford) tend to be even more protective of the 'no-of' rule for 'comprise' than some American counterparts.

🎯

The 'Whole' Rule

Always start with the big thing (the dataset) when using 'comprised.'

⚠️

The 'Of' Trap

If you write 'of' after 'comprised,' delete it immediately in formal writing.

In 15 Seconds

  • Formal way to list components of a research study or data pool.
  • Implies the list provided is complete and all-inclusive.
  • Used in reports, academic papers, and technical presentations.
  • Signals high-level English proficiency and professional authority.

What It Means

Imagine you are a detective who just found a mysterious box in an attic. You open it up and see three old photos, two silver spoons, and a dusty map. If you were writing a formal report for the police, you wouldn't just say "I found some stuff." You would say, "The contents comprised three photos, two spoons, and one map." In the world of tech and research, the dataset comprised does exactly this for information. It acts like a professional inventory list. When you use this phrase, you aren’t just listing items; you are telling your audience that this list is the "whole story." It gives the impression that you’ve counted everything and nothing else was included. It’s the difference between saying "I have some friends" and "My friend group comprises three people from high school and two from work."

How To Use It

Using this phrase is like putting on a tuxedo for a presentation. You place the "whole" (the dataset) at the start, followed by comprised, and then list the "parts." It is a very active, strong verb. It’s important to remember that in this specific structure, the dataset is the actor. It is doing the comprising. Think of it like a giant hug: the dataset is the arms, and the pieces of data are the things being hugged. You will most often find this in the "Methods" or "Results" section of a paper. It’s great for introducing demographics, like saying "The dataset comprised 500 teenagers from New York." It feels very precise and helps your reader visualize the scale of your work immediately. Just don't use it to describe your lunch—unless your lunch is being analyzed for a nutrition study!

Formality & Register

This phrase lives in the penthouse of formality. It is firmly situated in the formal to very_formal register. You will see it in medical journals, AI training documentation, and high-level business analytics reports. You will almost never hear a native speaker say this while hanging out at a Starbucks or texting about their weekend plans. If you said, "My weekend dataset comprised two movies and a pizza," your friends might think you’ve spent too much time looking at Excel spreadsheets. However, if you are writing a LinkedIn post about a project you finished or a formal email to a professor, this is exactly the kind of vocabulary that makes you sound like an expert. It shows you have a high level of English control (C1/C2 level).

Real-Life Examples

Let’s look at where this phrase actually pops up in the wild. If you’re reading a tech blog about how ChatGPT was trained, you might see: "The training dataset comprised billions of words from books and websites." If you are a medical student reading about a vaccine trial, you’ll see: "The dataset comprised 40,000 participants across three continents." Even in the world of gaming, developers might release a patch note saying: "The update dataset comprised 12 new textures and 4 audio fixes." It’s the language of the "pro." It’s also very common in news reporting when journalists are talking about big surveys or census results. It’s basically the "suit and tie" of the data world.

When To Use It

Use the dataset comprised when you are presenting a complete list of what you studied or analyzed. It’s perfect for the opening of a data summary. For example, if you are reviewing every TikTok video about a certain trend, you could say: "The dataset comprised 200 videos posted between Monday and Friday." It’s also useful when you want to emphasize the diversity of your data. Using it makes you sound objective and scientific. If you are applying for a job in data science or marketing, using this in your portfolio will definitely catch an interviewer’s eye. It tells them, "I know how to report my findings like a professional."

When NOT To Use It

Don't use this if you are only listing *some* of the parts. If your data has 100 items but you only want to talk about 10 of them, use "included" instead. Comprised implies the whole list. Also, avoid this in casual texting or chatty emails. Telling your mom that "The grocery dataset comprised eggs and milk" is a bit much. It’s also a bit weird to use it for abstract feelings. You wouldn't say "My happiness dataset comprised sunshine and coffee." Keep it for actual, countable information. If there’s no data involved, comprised usually feels out of place. It’s a tool for builders and researchers, not for poets.

Common Mistakes

The absolute biggest mistake people make—even native speakers!—is saying "comprised of."

The dataset was comprised of three groups. The dataset comprised three groups.
The dataset comprises of several variables. The dataset comprises several variables.

Strictly speaking, the whole *comprises* the parts. Think of it this way: the word "comprise" already means "is composed of." So if you say "comprised of," you are essentially saying "is composed of of," which sounds like a glitch in the matrix. Another mistake is using it for the future. We almost always use this in the past tense (comprised) or present tense (comprises) because we are describing data that already exists. Don't say "The dataset will comprise..." unless you are 100% sure what the final result will be!

Common Variations

If comprised feels a bit too stiff, you have a few other options. Consisted of is the most common neutral alternative. It’s safe, friendly, and everyone uses it. Included is best if you aren't listing everything. For a more modern, tech-focused vibe, you might say "The data pool featured..." or "The sample set contained...". In very academic writing, you might see "was composed of." Some people also use "encompassed," which sounds a bit more grand and large-scale. If you're writing a quick Instagram caption about your year in review, you might just say "My 2023 was made up of..." instead.

Real Conversations

Speaker A: How did the user testing go for the new app interface?

Speaker B: It was great. The dataset comprised 50 first-time users and 50 experienced users.

Speaker A: Did we get a good mix of ages?

Speaker B: Yes, the sample comprised people from ages 18 to 65.

Speaker A: Perfect, that's exactly what the client asked for.

Speaker B: Exactly. I'll have the full report ready by Zoom-time tomorrow.

Quick FAQ

Is comprised better than included? It depends. If you are listing every single part, comprised is more accurate and professional. If you are only mentioning a few highlights, included is the right choice. Can I use it for people? Yes! You can say "The team comprised five engineers." It makes the team sound like a structured unit. Is it okay to use "comprised of"? In casual speech, most people won't care. But in a C1 exam or a formal paper, you should avoid the "of" to keep the grammar purists happy. It’s one of those small details that marks you as a truly advanced speaker.

Usage Notes

Use this phrase for formal reporting of data. Avoid adding 'of' after 'comprised' in high-stakes writing. Always ensure the list you provide is complete if you choose this specific verb.

🎯

The 'Whole' Rule

Always start with the big thing (the dataset) when using 'comprised.'

⚠️

The 'Of' Trap

If you write 'of' after 'comprised,' delete it immediately in formal writing.

💬

Academic Credibility

Using this phrase correctly in a university essay can actually improve your perceived authority on the subject.

Examples

10
#1 Writing a lab report
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The dataset comprised results from 50 separate chemical reactions.

The data set consisted of results from 50 separate chemical reactions.

A classic academic use describing a completed experiment.

#2 AI research blog post
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

Our training dataset comprised text from Reddit, Wikipedia, and public news archives.

Our training data set included text from Reddit, Wikipedia, and public news archives.

Shows the modern use of the phrase in tech and machine learning.

#3 Job interview on Zoom
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

In my last project, the dataset comprised over 10,000 customer feedback forms.

In my last project, the data set was made up of over 10,000 customer feedback forms.

Using this in an interview makes you sound very organized and technical.

#4 Instagram caption for a photographer's portfolio
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M14.828 14.828a4 4 0 01-5.656 0M9 10h.01M15 10h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/></svg>

The series comprised 12 black-and-white portraits captured in London.

The series consisted of 12 black-and-white portraits captured in London.

A slightly more artistic but still formal application.

Common grammar error Common Mistake
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

✗ The dataset was comprised of fifty participants. → ✓ The dataset comprised fifty participants.

The data set consisted of fifty participants.

This highlights the 'no-of' rule in formal English.

#6 Explaining a hobby project
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M14.828 14.828a4 4 0 01-5.656 0M9 10h.01M15 10h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/></svg>

My personal finance dataset comprised every receipt I saved over the last year.

My personal finance data set included every receipt I saved over the last year.

Shows that 'dataset' can apply to personal life if the context is analytical.

#7 University lecture
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

The initial dataset comprised primarily undergraduate students.

The initial data set consisted primarily of undergraduate students.

Standard way to describe human participants in research.

#8 Humorous office chat
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M14.828 14.828a4 4 0 01-5.656 0M9 10h.01M15 10h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"/></svg>

My daily motivation dataset comprised 80% caffeine and 20% sheer panic.

My daily motivation data set was made up of 80% caffeine and 20% panic.

Using formal language for a joke adds a nice layer of irony.

#9 Emotional reflection
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4.318 6.318a4.5 4.5 0 000 6.364L12 20.364l7.682-7.682a4.5 4.5 0 00-6.364-6.364L12 7.636l-1.318-1.318a4.5 4.5 0 00-6.364 0z"/></svg>

The archive dataset comprised the final letters written by soldiers during the war.

The archive data set consisted of the final letters written by soldiers during the war.

Shows how the phrase can be used in history or social sciences.

Incorrect partial listing Common Mistake
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24" aria-hidden="true"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M21 13.255A23.931 23.931 0 0112 15c-3.183 0-6.22-.62-9-1.745M16 6V4a2 2 0 00-2-2h-4a2 2 0 00-2 2v2m4 6h.01M5 20h14a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z"/></svg>

✗ The dataset comprised a few examples like A and B. → ✓ The dataset included examples like A and B.

The data set included examples like A and B.

Don't use comprised if you aren't listing everything.

Test Yourself

Choose the most grammatically correct sentence for a formal research paper.

Which of these sentences follows traditional formal rules?

✓ Correct! ✗ Not quite. Correct answer: b

In formal English, 'comprise' is an active verb meaning 'to consist of.' Option A is a common error, and Option C reverses the whole-to-part relationship.

Complete the sentence using the correct form of 'comprise' or 'compose'.

The final research dataset _________ (past tense) three years of clinical observations.

✓ Correct! ✗ Not quite. Correct answer: comprised

Since the dataset (the whole) is the subject, 'comprised' is the correct active verb.

Match the phrase to the appropriate register.

Where would you most likely see: 'The dataset comprised...'

✓ Correct! ✗ Not quite. Correct answer: b

This is a highly formal academic phrase used for technical reporting.

Fill in the missing line in this professional dialogue.

Manager: 'What exactly is in the new marketing database?' Analyst: '_________________________________'

✓ Correct! ✗ Not quite. Correct answer: a

Option A is the most professional and grammatically accurate response.

🎉 Score: /4

Visual Learning Aids

Comprise vs. Compose

Comprise (Active)
The Whole comprises the parts
Compose (Passive)
The Parts compose the whole

Practice Bank

4 exercises
Choose the most grammatically correct sentence for a formal research paper. Choose C1

Which of these sentences follows traditional formal rules?

✓ Correct! ✗ Not quite. Correct answer: b

In formal English, 'comprise' is an active verb meaning 'to consist of.' Option A is a common error, and Option C reverses the whole-to-part relationship.

Complete the sentence using the correct form of 'comprise' or 'compose'. Fill Blank B2

The final research dataset _________ (past tense) three years of clinical observations.

✓ Correct! ✗ Not quite. Correct answer: comprised

Since the dataset (the whole) is the subject, 'comprised' is the correct active verb.

Match the phrase to the appropriate register. situation_matching B1

Where would you most likely see: 'The dataset comprised...'

✓ Correct! ✗ Not quite. Correct answer: b

This is a highly formal academic phrase used for technical reporting.

Fill in the missing line in this professional dialogue. dialogue_completion C1

Manager: 'What exactly is in the new marketing database?' Analyst: '_________________________________'

✓ Correct! ✗ Not quite. Correct answer: a

Option A is the most professional and grammatically accurate response.

🎉 Score: /4

Frequently Asked Questions

10 questions

In strict formal writing, yes. In casual speech, it is very common and usually accepted, but C1 learners should avoid it to show mastery.

Yes, e.g., 'The committee comprised ten members.' It works for any collective group.

'Comprise' usually means you are listing everything. 'Include' means you are listing some things.

No, you can say 'The dataset comprises...' in the present tense, but research papers usually use the past tense 'comprised.'

Journalism is less formal than academic writing, and the 'rule' is slowly changing in common usage.

Technically yes, but it's usually used for a collection of multiple items.

Yes, 'consisted of' is more common in general English, while 'comprised' is more common in high-level academic English.

It is pronounced like a 'z' (/z/), not an 's'.

Yes, if you are describing a project you worked on, it sounds very professional.

It is 'comprise.' There is no word 'comprose.' You might be thinking of 'compose.'

Related Phrases

🔗

Consisted of

similar

To be made up of.

🔗

Was composed of

similar

To be made up of.

🔗

Included

similar

To contain as part of a whole.

🔗

Encompassed

similar

To surround and have or hold within.

🔗

Constituted

contrast

To be the parts that make up the whole.

Was this helpful?

Comments (0)

Login to Comment
No comments yet. Be the first to share your thoughts!