Developing Eventscraper for Ugenda: How to keep a web scraper functional after a DOM change
dc.contributor.advisor | Vuurpijl, L.G. | |
dc.contributor.advisor | Grootjen, F.A. | |
dc.contributor.author | Ackermans, G.F.M.J. | |
dc.date.issued | 2016-08-25 | |
dc.description.abstract | The goal of this thesis was to explore techniques that can be used to develop a web scraper that is still able to scrape web pages after their DOM has been altered. In this thesis, the modern applications of web scraping are discussed, as well as literature on existing web scraping approaches. A prototype web scraper, Eventscraper, was developed for the purpose of evaluating the performance of several web scraping techniques. This research proposes a new technique to handle DOM changes: Path distance search. It turned out to be infeasible to conduct an experiment to compare the performance of path distance search with existing techniques. However, a hypothesis on its performance has been formed, based on a detailed analysis of its behaviour. This research concludes with several suggestions for future research. | en_US |
dc.embargo.lift | 3000-12-31 | |
dc.identifier.uri | http://theses.ubn.ru.nl/handle/123456789/5254 | |
dc.language.iso | en | en_US |
dc.thesis.faculty | Faculteit der Sociale Wetenschappen | en_US |
dc.thesis.specialisation | Bachelor Artificial Intelligence | en_US |
dc.thesis.studyprogramme | Artificial Intelligence | en_US |
dc.thesis.type | Bachelor | en_US |
dc.title | Developing Eventscraper for Ugenda: How to keep a web scraper functional after a DOM change | en_US |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Ackermans, G._BSc_Thesis_2016.pdf
- Size:
- 2.54 MB
- Format:
- Adobe Portable Document Format
- Description:
- Thesis text