Workshop 2 – Web Crawling
Presentation Type
Workshop
Abstract
Open source tools for web scrapping Web scraping or web mining involves interacting with distributed files and information systems through abstract interfaces, where the analyst has little direct control over the computer hardware or services.
Programming practices that support web scraping include:
- Language-independent file transfer protocols (i.e. HTTP)
- Self-documenting document structuring languages (HTML, XML, JSON)
- Abstract programming interfaces (API) through which data providers allow systematic queries to data repositories
- Text mining via pattern matching (regular expressions)
This workshop will cover open-source tools available to assist with these practices, with an emphasis on libraries that can be interface via either Python or R.
Start Date
2-10-2020 1:00 PM
End Date
2-10-2020 5:00 PM
Workshop 2 – Web Crawling
Dakota Room 250 A/C
Open source tools for web scrapping Web scraping or web mining involves interacting with distributed files and information systems through abstract interfaces, where the analyst has little direct control over the computer hardware or services.
Programming practices that support web scraping include:
- Language-independent file transfer protocols (i.e. HTTP)
- Self-documenting document structuring languages (HTML, XML, JSON)
- Abstract programming interfaces (API) through which data providers allow systematic queries to data repositories
- Text mining via pattern matching (regular expressions)
This workshop will cover open-source tools available to assist with these practices, with an emphasis on libraries that can be interface via either Python or R.