Developing Eventscraper for Ugenda: How to keep a web scraper functional after a DOM change
Keywords
No Thumbnail Available
Authors
Issue Date
2016-08-25
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The goal of this thesis was to explore techniques that can be used to develop
a web scraper that is still able to scrape web pages after their DOM has been
altered. In this thesis, the modern applications of web scraping are discussed, as
well as literature on existing web scraping approaches. A prototype web scraper,
Eventscraper, was developed for the purpose of evaluating the performance of
several web scraping techniques. This research proposes a new technique to
handle DOM changes: Path distance search. It turned out to be infeasible
to conduct an experiment to compare the performance of path distance search
with existing techniques. However, a hypothesis on its performance has been
formed, based on a detailed analysis of its behaviour. This research concludes
with several suggestions for future research.
Description
Citation
Supervisor
Faculty
Faculteit der Sociale Wetenschappen