Workshop 2 – Web Crawling

Presenter Information/ Coauthors Information

Peter Claussen, South Dakota State University

Presentation Type

Workshop

Abstract

Open source tools for web scrapping Web scraping or web mining involves interacting with distributed files and information systems through abstract interfaces, where the analyst has little direct control over the computer hardware or services.
Programming practices that support web scraping include:
- Language-independent file transfer protocols (i.e. HTTP)
- Self-documenting document structuring languages (HTML, XML, JSON)
- Abstract programming interfaces (API) through which data providers allow systematic queries to data repositories
- Text mining via pattern matching (regular expressions)

This workshop will cover open-source tools available to assist with these practices, with an emphasis on libraries that can be interface via either Python or R.

Start Date

2-10-2020 1:00 PM

End Date

2-10-2020 5:00 PM

This document is currently not available here.

Share

COinS
 
Feb 10th, 1:00 PM Feb 10th, 5:00 PM

Workshop 2 – Web Crawling

Dakota Room 250 A/C

Open source tools for web scrapping Web scraping or web mining involves interacting with distributed files and information systems through abstract interfaces, where the analyst has little direct control over the computer hardware or services.
Programming practices that support web scraping include:
- Language-independent file transfer protocols (i.e. HTTP)
- Self-documenting document structuring languages (HTML, XML, JSON)
- Abstract programming interfaces (API) through which data providers allow systematic queries to data repositories
- Text mining via pattern matching (regular expressions)

This workshop will cover open-source tools available to assist with these practices, with an emphasis on libraries that can be interface via either Python or R.