Data Diggers: The Art of Interacting and Collecting Data from the Web

Summer 2024 Complexity Global School

This course gets into the practical skills of web scraping and data interaction, empowering you to uncover, collect, and analyze web data effectively. Many people spend hours clicking, typing, and copy pasting to perform repetitive tasks, unaware that the machine they’re using could do their job in seconds if they gave it the right instructions. Through hands-on projects and real-world applications, we will learn how to navigate complex websites, extract valuable information, and understand the ethical considerations involved in data collection. This course is perfect for those who are interested in leveraging web data to drive decisions and advanced research.

Class Recording

You can access the full recording with on this link

Materials

The full GitHub repository can be accessed here: link. This GitHub repository provides Jupyter Notebooks and instructions to set up a Conda environment with Python libraries like Selenium and beautifulsoup4. The README includes everything you need to set up your environment and follow along, including installation instructions, requirements and a Selenium tutorial.

The notebooks can also be found in the following links:

Additional Resources



Ignacio
Sarmiento Barbieri

Universidad de Los Andes