Workshop 4: Biomedical Text Mining and Natural Language Processing

04 Jul 2017
13:10-17:00

Workshop 4: Biomedical Text Mining and Natural Language Processing

Presentation slides

Workshop Report

Participants were given the link to all resources and scripts.

  1. Data source: Physionet.org (Deidentified Medical Text) — takes 5-7 days for your account to be approved
  2. Workshop slides

Part 0: Setting up R and RStudio

Part 1: Introduction to information retrieval, information extraction, and text mining

  • regular expressions — ways of extracting data from an unstructured dataset
  • natural language processing — creating language models

Part 2: Mchine learning-based approaches for text mining

  • more complex algorithms to obtain insight from a corpus

Post-conference opportunities

  1. The surgical database on PGH (mostly unstructured) for possible analytics in partnership with Department of Surgery
  2. Problem-oriented approach to learning R (starting with a clinical question and then using R to get the answers)
  3. University-based inter-disciplinary collaboration around #1

 

Workshop Requirements

– Laptop

– Latest R and RStudio installed

 

Participants

– Familiarity with R language (basic level should be enough).

  1. https://www.datacamp.com/courses/free-introduction-to-r (easy) OR
  2. http://tryr.codeschool.com/ (easy) OR
  3. http://swirlstats.com/students.html (easy-moderate)