Using Python and Web-scraping to dynamically search multiple job-board websites. Part Two: Extracting desired information.

In Part One, the parameters for the program were set by the users. These parameters were what type of job the user was looking for, where it was that they are searching, and which websites they want to look for those jobs. In this part, the program will be expanded to extract that desired information from the chosen websites. Although there are several websites currently housed within the program, only two will be demonstrated here: part two will look at Monster.com, and part three will look at Glassdoor.com.

A quick recap of where the main() function is so far:

def…

Using Python and Web-scraping to dynamically search multiple job websites. Part 1: Setting up the search parameters.

Finding a job can be difficult. Adding to the difficulty is the plethora of job-hunting websites each with their own niche and service. The abundance of choices can overwhelm, and an aggregating program can solve the problem of choosing which website to search under which conditions. Here I present a three-part series outlining the creation of a job-hunting application that can web-scrape several prominent job-search websites for user-defined job type and location.

Our goal is to create a program that a user can do the following:

  • Enter the type of job they are looking for
  • Enter where they are looking…

Hands-on Tutorials

The maddening adventures of extracting millions of tweets.

Twitter’s API is free-to-use and is overall a very useful resource for data analysis. Extracting tweets for one, or several, users can happen without much additional work other than using an R package. Yet — if you are interested in harvesting millions of tweets from tens of thousands of users, you will have to sacrifice additional tears. Here, the general strategy will be outlined for doing just that so that you don’t also lose your collective minds.

The goal is to outline the progressive steps taken to produce a robust and functioning script to extract millions of timelines from a…


In the previous series, Philadelphia was looked at in terms of poverty, emphasizing the relationship between demographics and poverty. Now, let’s look at how the level of poverty can characterize the Poverty within Philadelphia. Furthermore, once the level of poverty has been characterized, an attempt to balance the quantity of those in poverty with the severity of poverty within a given zip code will be performed. The hope is to build a model that can capture the city’s areas that may require additional resources and assistance.

Poverty levels are based on a percentage below the poverty line on the federal…


How Poverty, Education, and Work-force can help understand the health of a Great City, a series.

Philadelphia is a diverse city with 1.59 million inhabitants covering 142.7 square miles. Of these 1.59 million inhabitants, approximately 500,000 of them live in poverty. Today, a quick look at the dynamics of poverty within Philadelphia will be investigated by utilizing R and maps. Poverty will be explored at the zip-code level as per the 2010 Census. Here, several issues will be investigated including Poverty, Education, and Work-force throughout Philadelphia to gain a snap-shot of the city’s health. For all, the investigation will utilize zip-code centric color-coded maps.

Here, raw counts of poverty, per-capita counts of poverty, disparity-counts of poverty…


In Depth Analysis

A look at COVID-19's effects on the Probability of Death given one’s Age and Residence.

Summary : In the following Part One of Three, a time-series is analyzed utilizing Bayesian Probabilities in an effort to build models to quantify the probability of death between the nursing home and general populations. It is found that the probabilities were not independent, and were calculated appropriately. Furthermore, several distinct differences were noted between the nursing home and general public in terms of probability of death, probability of belonging to a certain age group. Following the calculations of the final probability of condition given one’s location and age, the direct comparison at a yearly-, and age-level could finally be…


Organizations love PDFs, especially governmental bodies. To the masses, they are easy to read, with nice and clean formatting that is easy on the eyes. To the data scientist, they can be nightmares to upload. For example, take a look at this PDF:

What a 105-page nightmare that would be! R reads PDFs as 1-line imports, but clearly this PDF is not designed with data scientists in mind.

Extracting this data for analysis and manipulation is going to be a maze of extractions, re-arrangements, and ultimately many extra-curricular relaxation techniques.

The good news is, I like doing this! So here…

Justin Cocco

Hi! I’m mostly here.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store