Developing Eventscraper for Ugenda: How to keep a web scraper functional after a DOM change

Keywords
No Thumbnail Available
Issue Date
2016-08-25
Language
en
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The goal of this thesis was to explore techniques that can be used to develop a web scraper that is still able to scrape web pages after their DOM has been altered. In this thesis, the modern applications of web scraping are discussed, as well as literature on existing web scraping approaches. A prototype web scraper, Eventscraper, was developed for the purpose of evaluating the performance of several web scraping techniques. This research proposes a new technique to handle DOM changes: Path distance search. It turned out to be infeasible to conduct an experiment to compare the performance of path distance search with existing techniques. However, a hypothesis on its performance has been formed, based on a detailed analysis of its behaviour. This research concludes with several suggestions for future research.
Description
Citation
Faculty
Faculteit der Sociale Wetenschappen