Medical Text Mining


More information here

The British Medical Journal Group (BMJ Group) has a wide and var- ied content set, including a suite of medical journals, online learning materials, best practice guidelines, clinical evidence summaries, a doc-2-doc online forum, and a port- folio system for doctors. There is an emerging need to aggregate accross these content types, providing a uni ed tagging and linking system, so that related content can eas- ily be retrieved across the group. The main use-cases include an improved search and browse capability, and the (semi-)automatic construction of \specialty portals”, which may be medical in nature (e.g. diabetes) or non-medical (e.g. NHS reform). This pro- vides a challenge to standard Pattern Analysis algorithms, due in part to the highly technical nature of the documents. Prior work has mainly been focussed on the use of tools that automatically index against a medical ontology (such as MetaMap and UMLS), but this approach has drawbacks in terms of computational resources, lack of user control, and limitations to medical-only concepts. A hybrid approach based on statistiscal and semantic methods appears to have some merit, and may be the way forward. The presentation will focus on preliminary work taking the two approaches, and talk about some speci c technical issues that have arisen along the way. This is based on joint work with Jonathon Peterson, Chris Wroe, and Rob Challen.