Extraction and Analysis of Clinical Documents

This course is part of the UBC Micro-certificate in Natural Language Processing to Improve Patient Care. The program consists of three courses that can be taken individually or combined into the Micro-certificate.

Clinical notes, reports and assessments contain valuable insights but are often difficult to use at scale. This course helps you apply NLP techniques to classify, extract and summarize information from clinical documents to support research, reporting and patient care.

You build practical skills for working with real-world text, including text segmentation, information extraction, classification and summarization. You also learn how LLMs support clinical questions and assist with documentation. The course is designed for healthcare professionals and researchers seeking to use text data more effectively in clinical settings.

By the end of this course, you will be able to:

  • Apply text classification techniques to label clinical documents
  • Summarize healthcare documents using NLP methods
  • Extract relevant information using LLM-based tools
  • Use question-answering techniques for clinical document analysis
  • Evaluate NLP use cases in healthcare contexts
  • Assess the performance of NLP models for document extraction and analysis

For those who are new to NLP, we recommend starting with Overview of NLP and Large Language Models [link].

Course activities include videos, short quizzes, practical lab assignments and instructor-moderated discussions. Optional weekly office hours provide a space to review methods and troubleshoot challenges with clinical text.

Course outline

Module 1: NLP Workflows

This module focuses on real-world NLP applications in cancer-related clinical documentation, particularly around classification tasks, segmentation and privacy.

Module 2: Advanced Document Classification

This module focuses on text segmentation, binary/multi-class classification tasks and fine-tuning LLMs, with an emphasis on evaluation, optimization and real-world deployment of classifiers.

Module 3: Information Extraction and Clinical Question Answering

This module dives into structured and unstructured data extraction, followed by introductory QA techniques.

Module 4: Summarization, Decision Support & LLM Limitations

The content in this module spans summarization, speech-to-text, hallucination risks, and integrating NLP for decision-making and critical evaluation of LLM performance in clinical settings.

How am I assessed?

You are assessed through quizzes, discussion posts and hands-on lab assignments. Multiple-choice quizzes confirm your understanding of lecture content. Discussion posts evaluate your critical thinking and engagement with weekly topics, with feedback from the instructor or TAs. Lab assignments assess your ability to apply techniques and explain your results using scoring rubrics.

A minimum grade of 70% is required to pass.

Expected effort

Expect to spend five to seven hours per week to complete readings, videos, quizzes, lab assignments and optional office hours.

Technology requirements

  • an email account
  • a computer, laptop or tablet, using Windows, macOS or Linux
  • the latest version of a web browser (or previous major version release)
  • a Google account to access Google Drive
  • a reliable internet connection
  • a video camera and microphone

For virtual office hours, you’ll also need:

  • a video camera and microphone

One day before the start of your course, we’ll email you step-by-step instructions for accessing your course.

Course format

This course is 100% online and instructor supported with weekly instructor office hours. Course work is done independently and at your own pace within deadlines set by your instructor. Log in anytime to your course to access the lessons as they become available.

Available sessions

There are no upcoming sessions currently scheduled for this course.

Related courses

How can we help?

We’re here to answer your questions, discuss learning options and provide insights, recommendations and referrals.  

Facebook The logo for the Facebook social media service. Instagram The logo for the Instagram social media service. Linkedin The logo for the LinkedIn social media service. Question A question mark inside a solid circle. Twitter The logo for the Twitter social media service. Youtube The logo for the YouTube video sharing service. RSS The symbol to indicate an RSS feed. Arrow An arrowhead pointing to the right Arrow, right to bracket An arrowhead pointing to the right, into a bracket character. External Link An arrowhead pointing up and to the right, from inside a box Bars Three horizontal bars. Books Three book spines, viewed head-on, one leaning. Calendar A monthly calendar page. E-commerce Cart A shopping cart Checkmark A checkmark character Chevron A chevron character pointing to the right Checkmark A checkmark character inside a solid circle Cost A dollar sign inside a solid circle Info An 'i' character inside a solid circle Play An arrowhead pointing to the right inside of a solid circle User A silhouette of a person inside a solid circle Envelope A closed envelope Certificate A document with an award pinned to it Pen A document with a pen beside it Filter A funnel / filter silhouette Laptop Computer An open laptop computer with a blank screen Location Pin A map location pin Search A magnifying glass Minus A minus sign News A folded newspaper Plus A plus symbol indicating more or the ability to add Quote, left An opening quotation character Alert An exclamation point inside a solid triangle User A silhouette of a person Close The character 'X'