FAQ¶

This document contains a list of questions we’ve heard developers ask about how the system works, how to write a scraper, or anything else related to our data engineering efforts. Our intention is for this document to be updated frequently and be a living resource of common questions and their answers.

In code snippets below you will see references to a few variables (d, engine, df), these are

d: an instance of a scraper
engine: a sqlalchemy engine, most often the sqlite based dev engine
df: a clean/normalized DataFrame that is the output of the normalize method

Location_id sql error¶

How to diagnose this problem: when calling d.put(engine, df) you will see an error that looks like this:

IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: covid_observations.location_id

There are two possible cases for handling locations: using a location_name column with state or county name or using a location column with fips codes

`location_name` column¶

If you have a location_name column, chances are you have a misspelled county name, a row that isn’t a county (All or Total are common issues)

How to fix this problem: Try the following method: d.find_unknown_location_id(engine, df)

It will return rows of your DataFrame for which we do not recognize the county name

You can compare this list against the list of counties for that state, which you can obtain via:

locs = pd.read_sql("select * from locations", engine)
state_locs = locs.loc[locs["state_fips"] == d.state_fips, :]

Most often, the fix in this situation is to fix spelling/capitalization for a county name (to match what is in state_locs) or delete the offending rows if they are obviously not counties

`location` column¶

If instead you have a location column, check to make sure that each row of the location column maps into a known location for that state

You can use the state_locs DataFrame from the code snippet above to see all known locations for the state

variable_id sql error¶

How to diagnose this problem: when calling d.put(engine, df) you will see an error that looks like this:

IntegrityError: (sqlite3.IntegrityError) NOT NULL constraint failed: covid_observations.variable_id

How to fix this problem: Try the following method: d.find_unknown_location_id(engine, df)

It will return rows of your DataFrame for which we do not recognize the variable (recall that a variable_id is defined by a triplet ("category", "measurement", "unit") – the CMU columns)

The most common fixes for this problem are:

Fix spelling on one of the CMU columns
Change recorded value of CMU columns to match a value in the file can_tools/bootstrap_data/covid_variables.csv
If it is an entirely new type of variable, you may need to add a row to the can_tools/bootstrap_data/covid_variables.csv file and try to .put again
- If you are adding a brand new value for any of category, measurement, unit you also need to add the correspoinding value to one of can_tools/bootstrap_data/covid_{categories,measurements,units}.csv

demographic_id sql error¶

TODO

FAQ¶

Location_id sql error¶

`location_name` column¶

`location` column¶

variable_id sql error¶

demographic_id sql error¶

CAN Scrapers

Navigation

Related Topics

FAQ¶

Location_id sql error¶

location_name column¶

location column¶

variable_id sql error¶

demographic_id sql error¶

`location_name` column¶

`location` column¶