SEARCH WITHIN CONTENT
Tao Yang *
Citation Information : International Journal of Advanced Network, Monitoring and Controls. Volume 5, Issue 4, Pages 58-65, DOI: https://doi.org/10.21307/ijanmc-2020-038
License : (CC-BY-NC-ND 4.0)
Published Online: 11-January-2021
2019 Novel Coronavirus (COVID-19) has brought immeasurable losses and huge impact to the world. For human health, many Centres for Disease Control(CDC) in various countries around the world are actively collecting data and doing a good job in virus prevention and control. The real-time release of the epidemic situation, with analysis and prediction, is a very effective method to combat the epidemic. By studying the situation of epidemic data, based on Jupyter Notebook, this paper gives the visual analysis process of COVID-19 epidemic data, and carries out specific analysis and implementation. And then it estimates the coronavirus converges roughly using sigmoid fitting. Although the sigmoid fitting tend to underestimate the curve, its actual value tend to be more than sigmoid curve estimation. The proposed data visualization analysis method could effectively display the status of the COVID-19 epidemic situation, hoping to help control and reduce the impact of the COVID-19 epidemic.
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness. A growing number of patients reportedly have indicated person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people.
Therefore, it is very important to visually analyse the COVID-19 Epidemic Situation, which helps to control the impact of the COVID-19 epidemic and reduce losses.
The dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. This is a time series data and so the number of cases on any given day is the cumulative number. The data is available from 22 Jan, 2020. We can download latest data from Johns Hopkins University github repository: https://github.com/CSSEGISandData/COVID-19.We can also grab data from various Centres for Disease Control [2-6].
The data folder contains the previously posted dashboard case reports from Jan 21 to Feb 14, 2020 for the coronavirus COVID-19 (formerly known as 2019-nCoV). We will refer to the data provided in the new folder, entitled “csse_covid_19_data folder”. Moving forward they will be updating daily case reports into this new folder. Additionally, the previously uploaded data from Jan 21-Feb 14, 2020 is also included in the new folder, and it has been cleaned and re-formatted to address inconsistencies in the time zone and update frequency that resulted during the transition from our manual updates to automated updates (which took place on Feb 1, 2020. The new folder now includes one case report per day, from the same time of day. This will be the standard moving forward (as of Feb 14, 2020). That is the data we will load for visualization analysis.
Main file in this dataset is covid_19_data.csv and the detailed descriptions are below.
⟡ Sno - Serial number
⟡ ObservationDate - Date of the observation in MM/DD/YYYY. We will convert ObservationDate and Last Update to datetime since they are currently taken as object.
⟡ Province/State - Province or state of the observation (Could be empty when missing)
⟡ Country/Region - Country of observation
⟡ Last Update - Time in UTC at which the row is updated for the given province or country. (Not standardised and so please clean before using it)
⟡ Confirmed - Cumulative number of confirmed cases till that date
⟡ Deaths - Cumulative number of deaths till that date
⟡ Recovered - Cumulative number of recovered cases till that date
For the purpose of data visualization, we mainly use the Python-based tools of Jupyter Notebook and plotly. The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The plotly visualization is heavy used in this kernel so that we can interactively see the figure, map etc. As a side effect, it might take a little bit more time to initialize the Python environment and to load the kernel. Then grab data from the Internet and load the data.
When we see the confirmed cases in worldwide, it just look like exponential growth curve. The number is increasing very rapidly especially recently. As a further matter, daily new confirmed cases started not increasing from April 4. After that, flat trend continues so far, as shown in Figure 1.
Moreover, when we check the growth in log-scale below figure, we can see that the speed of confirmed cases growth rate slightly increases when compared with the beginning of March and end of March. In spite of the Lockdown policy in Europe or US, the number is still increasing rapidly, as shown in Figure2.
It looks like fatalities curve is just shifted the confirmed curve to below in log-scale, which means mortality rate is almost constant. We see that mortality rate is kept almost 3%, however it is slightly increasing gradually to go over 7% at the end of April. Europe & US has more seriously infected by Coronavirus recently, and mortality rate is high in these regions, as shown in Figure 3. It might be because when too many people get coronavirus, the country cannot provide enough medical treatment.
There are 187 countries in the dataset. How’s the distribution of number of confirmed cases by country? It is difficult to see all countries so let’s check top countries as shown in Figure 4.
Now US, Italy and Spain has more confirmed cases than China, and we can see many Europe countries in the top. Korea also appears in relatively top despite of its population, this is because Korea executes inspection check aggressively.
Let’s check these major country’s growth by date.
As we can see, Coronavirus hit China at first but its trend is slowing down in March which is good news. Bad news is 2nd wave comes to Europe (Italy, Spain, Germany, France, UK) at March. But more sadly 3rd wave now comes to US, whose growth rate is much faster than China, or even Europe. Its main spread starts from middle of March and its speed is faster than Italy. Now US seems to be in the most serious situation in terms of both total number and spread speed. Now let’s see the confirmed cases for the top 30 countries, as shown in Figure 5.
In terms of number of fatalities, Europe & US are serious situation now, as shown in Figure 6. Many countries have more fatalities than China now, including US, Italy, Spain, France, UK, Iran Belgium, Germany, Brazil, Netherlands. US’s spread speed is the fastest, US’s fatality cases become top1 on Apr 10th.
Now let’s see mortality rate by country, as shown in Figure 7.
Italy is the most serious situation, whose mortality rate is over 10% as of 2020/3/28.We can also find countries from all over the world when we see top mortality rate countries, as shown in Figure 7. Iran/Iraq from Middle East, Philippines & Indonesia from tropical areas. Spain, Netherlands, France, and UK form Europe etc. It shows this coronavirus is really worldwide pandemic.
The countries whose mortality rate is low are shown in Figure 8.
By investigating the difference between above & below countries, we might be able to figure out what is the cause which leads death.
Be careful that there may be a case that these country’s mortality rate is low due to these country does not report/measure fatality cases properly.
Let’s see number of confirmed cases on map. Again we can see Europe, US, Middle East (Turkey, Iran) and Asia (China, Korea) are red, as shown in Figure 9.
When we see mortality rate on map, we see Europe (especially Italy) is high. Also we notice Middle East (Iran, Iraq) is high. When we see tropical area, I wonder why Philippines and Indonesia are high while other countries (Malaysia, Thai, Vietnam, as well as Australia) are low. For Asian region, Korea’s mortality rate is lower than China or Japan, I guess this is due to the fact that number of inspection is quite many in Korea[9-10].
From the mortality rate map, it seems that mortality rate is especially high in Europe region, compared to US or Asia.
Why mortality rate is different among country? What kind of hint is hidden in this map? Especially mortality rate is high in Europe and US, is there some reasons? There is one interesting hypothesis that BCG vaccination.
Let’s see the DAILY new cases trend as shown in Figure12.
We find from the figure 12:
⟡ China has finished its peak at Feb 14, new confirmed cases are surpressed now.
⟡ Europe&US spread starts on mid of March, after China slows down.
⟡ As effect of lock down policy in Europe (Italy, Spain, Germany, France) now comes on the figure, the number of new cases are not so increasing rapidly at the end of March.
⟡ Current US new confirmed cases are the worst speed, recording worst speed at more than 30k people/day at peak. Daily new confirmed cases start to decrease from April 4 or April 10.
⟡ After that we can see a weekly trend that the confirmed cases becomes small on Monday. I think this is because people don’t (or cannot) get medical care on Sunday so its reporting number is low on Sunday or Monday.
As we can see, the spread is fastest in US now, at the end of March. Let’s see in detail what is going on in US. When we see inside of the US, we can see only New York, and its neighbour New Jersey dominates its spread and are in serious situation. The number of New York confirmed cases is over 50k, while other states are less than about 5k confirmed cases, as shown in Figure 13.
Mortality rate in New York seems not high, around 2% for now, as shown in Figure 14.
All state is US got affected from middle of March, and now growing exponentially. In New York, less than 1k people are confirmed on March 16, but more than 50k people are confirmed on March 30. 50 times explosion in 2 weeks! The confirmed cases by state in US is show in Figure 15.
When we look into the Europe, its Northern & Eastern areas are relatively better situation compared to Eastern & Southern areas. The map of European Countries with Confirmed Cases is shown as Figure 16 and Figure 17.
Especially Italy, Spain, German, France, UK are in more serious situation. Number of confirmed cases rapidly increasing in Russia now (as of May 1), Russia is now potentially very dangerous situation.
When we check daily new cases in Europe(as shown in Figure 18), we notice:
In Asia, China & Iran have many confirmed cases, followed by South Korea & Turkey. Asian Countries with Confirmed Cases is as shown in Figure 19.
The coronavirus hit Asia in early phase, how is the situation now?
China & Korea is already in decreasing phase. Unlike China or Korea, daily new confirmed cases were kept increasing on March or April, especially in Iran or Japan. But the number is started to decrease now on these country as well, as shown in Figure 20.
Of course everyone is wondering when the coronavirus converges. Let’s estimate it roughly using sigmoid fitting.
Sigmoid fitting with all latest data
If believe above curve, the number of confirmed cases is slowing down now and it will be converging around the beginning of May in most of the country. It might take until beginning on June in US.
Let’s try validation by excluding last 7 days data, as shown in Figure 22.
Now noticed that sigmoid fitting tend to underestimate the curve, and its actual value tend to be more than sigmoid curve estimation.
Therefore, need to be careful to see sigmoid curve fitting data; actual situation is likely to be worse than the previous figure trained with all data.
Based on data available on May 6, the paper showed the visualization of the COVID-19 Epidemic Situation, including the worldwide trend, country-wide growth, and so on. Then it estimated when the coronavirus converges roughly using sigmoid fitting. The model’s estimates and predictions closely match reported confirmed cases. Therefore the proposed data visualization analysis method could effectively display the status of the COVID-19 epidemic situation, hoping to help control and reduce the impact of the COVID-19 epidemic.
The next steps include applying the method to global COVID-19 death data into small regions, as provinces. The method of visualization analysis could also be used to evaluate population mortality and the spread of other diseases.
The authors wish to thank Corochann, who is an Engineer at Preferred Networks in Tokyo. This work was supported in part by the Ph.D. Research Initiation Fund of Nanchang Institute of Science and Technology with the Project (No. NGRCZX-18-01) and supported by Robot and Intelligent System Research Centre of Artificial Intelligence College of Nanchang Institute of Science and Technology (No.NGYZY-20-005).