Zhang Maiwen

MSc in Computer Science
Computing Lab, University of Oxford

Email:
ÖÐÎÄ|English



Final Project (Awarded an "A")
(News Retrieval & Natural Language Processing)
In the end of my master's academic pursuit, I created a news background summarization system that works as a web server. With an input query, the system retrieves the relevant news documents from a 330,000-file collection and makes a summarization of the news background related to the query based on the retrieved documents.

Software screen-copies:


Creating multi-document background
summaries for breaking news


[Download Thesis]

By Maiwen Zhang

Supervisor: Dr. Stephen Clark



Abstract
Automatic text summarization is an active field of research in both the Information
Retrieval (IR) and the Natural Language Processing (NLP) communities since it
provides an efficient way to access very large repositories of data. This dissertation
aims to combine the processes of information retrieval, clustering and
extractive-based multi-document summarization so as to produce background
summaries for a user's query (a piece of breaking news), based on a wired-news
collection of 330,000 documents. With a single-event clustering method and an
event-based summarization model introduced in the dissertation, the system
successfully produced chronologically listed summaries that formed good
background information to the user's input query. The summarizer captured the
important sentences and minimized internal redundancy. Moreover, the extracted
sentences are organized according to their natural orders.
Zhang Maiwen © 2005