UNIVERSITY OF HERTFORDSHIRE COMPUTER SCIENCE RESEARCH COLLOQUIUM presents "Parsing Romanian Addresses with Ruby: Agile Hill Climbing" Dr. Peter Lane (University of Hertfordshire) W. S. (Bill) Metcalf (Geo Strategies, Ltd.) 3 December 2014 (Wednesday) 1 pm - 2 pm Hatfield, College Lane Campus Seminar Room D102 Everyone is Welcome to Attend Refreshments will be available Abstract: Address parsing is the process of assigning terms within a customer address into fields, such as the street name, locality or county. Parsing is a vital step in uniquely identifying an address so that further information, including postcodes or geographical information, can be associated with it. In this talk, we will cover the design and implementation of a system which parses Romanian addresses. Examples will demonstrate the complexity of the problem, in part due to the lack of any standard address format, but also from the intricacies of the typical Romanian address. The current implementation uses a form of hill-climbing algorithm, with a three-tiered arrangement of experts. Development was completed in JRuby, a JVM-based implementation of the Ruby language. Apart from its easy-on-the-eye, compact syntax, Ruby has excellent support for testing and meta-programming. Ruby's flexibility was required to agilely adapt the implementation as knowledge on how to parse Romanian addresses was developed during the project's lifetime. Results will be presented based on real-world use by Geo Strategies, who process large volumes of Romanian addresses each month. We will discuss overall accuracy, false positive rates, and processing speed, as well as look at future directions for improvement. --------------------------------------------------- Hertfordshire Computer Science Research Colloquium http://cs-colloq.stca.herts.ac.uk