Saturday, May 02, 2020

Corrections


When I wrote my last post, I tried to find the best way to determine the 'fourteen-day downward trajectory' as stipulated in the White House and CDC guidelines. Some sites showed a seven-day rolling average, and others a two-week rolling average. At the time, I decided to split the difference and show a ten-day rolling average.

I went back to the Georgia Department of Public Health site today and found that there are some fun new graphs online – including a 'seven-day rolling average' and a 'fourteen day window'. Keep in mind, the screen shot was taken today, May 2nd, but Georgia started its 're-opening' on April 24th. Anyway, their trend line looked way different than mine. I am using data archived at Wikipedia, and I can corroborate that data with an archive at USAFacts – they track very closely. So I plotted out their data and found some interesting discrepancies:
  • their dataset starts with two cases on February 2nd, though the DPH announced their first case on March 2nd;
  • there are two records for March 29th (two entries of 713 and 378, versus one entry of 1,091), which lowers the rolling average substantially in that time frame;
  • by April 1st, the DPH data shows 10,498 cases; the Wikipedia and USAfacts show 4,748 and 4,744 respectively;
  • by April 24th, the date the state began to re-open, the DPH data shows 25,913 cases; the Wikipedia and USAFacts show 22,491 and 22,023 respectively;
  • for May 1st, the DPH shows 27,709 (and only 32 new cases), and the others show 27,492 and 26,851.
So I made a graph of the DPH dataset, and my seven-day average looks exactly like the one on the web site – that is, my methodology matches theirs.

Here is the full Wikipedia dataset with a seven-day rolling average; the light blue line is the Georgia DPH dataset. You can see how the DPH dataset increases the count before mid-April, and decreases the count after, which accounts for the decreasing trend line for late April on the DPH web site. The rolling average for Wikipedia looks nothing like the one published on the DPH web site (the DPH line seems unnaturally smooth, too):


The DPH data is in blue; the DPH seven-day average in red. The Wikipedia data is in purple; the Wikipedia seven-day average is in yellow. It sure looks like the data was adjusted to create a particular look for that trend line. FYI – both Wikipedia and USAFacts cite Georgia's DPH as their source.

Oddly, as I waited for the today's data from Portugal, I was shocked to see the number decrease from the previous day: from 25,351 on May 1st, to 25,190 on May 2nd. The Portugal DGS announced that there had been duplicate records, double-counted, going back to April 25th, and has corrected the data for the last week. Comparing the new data with a seven-day rolling average, versus Georgia, it looks like this (same trend line as above, but no February; also compare the ten-day graphs from my last post):



I am not a statistician, but it seems clear that Portugal has been on a consistent downward trajectory since about April 10th; Georgia's situation is less clear. Just for giggles, I sent an email to the DPH to see if they would correct their data. Anyway, will continue to track the two trend lines and see where they go.

cases3,386,519 global • 1,142,416 USA • 25,190 Portugal
deaths239,448 global • 66,154 USA • 1,023 Portugal

Here's a great article from CityLab looking at the quality of the data being created during this pandemic.

A third dataset archived at The Atlantic's COVID Tracking Project also lines up with the Wikipedia and USAFacts datasets.

Another cautionary review of the numbers is at New York Magazine, highlighting premature action by both Georgia and Texas (which re-opened stores, restaurants, and theaters on May 1st, despite rising case numbers):
But that reopening is already starting, which means the conditions that have produced those elegant (and encouraging) curves are ending. According to one estimate, Georgia’s reopening, which began this week, has already produced a thousand new cases in a span of 24 hours and is expected to double deaths by August.

No comments: