Week 10
Cleaning and Data
Analysis in Python

SOCI 269

Sakeef M. Karim
Amherst College

AN INTRODUCTION TO QUANTITATIVE SOCIOLOGY—CULTURE AND POWER

Module III Begins–
April 1st

An Update

“Midterm” Assignment Deadline

Your “midterm” assignments are now due by 8:00 PM on Tuesday, April 8th.

Instructions Are Live

Assignment instructions are, of course, available online.

An Introduction to Python

Installing Python Locally

Download the Anaconda Distribution.

You can, of course, also download Python directly from its main website.

Launching Python in

We can use reticulate as a portal to Python in

Show the underlying code
library(reticulate)

# Create new Anaconda directory featuring select packages:

conda_create("soci269")

# Moving forward, to use the conda environment created above, simply run: 

# use_condaenv("NAME OF ENV GOES HERE")

# use_condaenv("soci269")

# Add pandas, seaborn and matplotlib to your new Anaconda (conda) environment:

conda_install("soci269", c("pandas", "seaborn", "matplotlib"))

# GENERATING PLOTS VIA SEABORN --------------------------------------------

sns <- import("seaborn")

plt <- import("matplotlib.pyplot")

# Let's generate a simple plot via seaborn:

sns$set_theme()

sns$scatterplot(x = "bill_depth_mm",
                y = "bill_length_mm",
                hue = "species",
                data = palmerpenguins::penguins)

plt$show()

Positron May Be the Future

Not Recommended—Yet

Download the Positron IDE.

Warning: Positron is still in its infancy.

Using Jupyter Notebooks

Interactive .ipynb Files

Jupyter Notebooks are living, interactive (.ipynb) documents. They allow users to craft a narrative, edit and execute lines of Python or code in real time, and generate a wide range of outputs.

Using Colab

We’ll be using .ipynb files and Colaboratory for Module III.

Using Colab

Note

The rest of today’s session will take place in Colab!

Polars in Python
April 3rd

Presentation Order

Let’s settle on the presentation order in real-time.

A Note on Presentation Instructions

Formal guidelines will be uploaded in the coming days.

Back to Jupyter

Launch Colab

Note

The rest of today’s session will, once again, take place in Colab.

Enjoy the Weekend

References

McKinney, Wes. 2022. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Jupyter. 3rd Edition. Sebastopol, CA: O’Reilly.