Covid-19 Insights
Aims Grid
Purpose
To unlock the impact of the Covid-19 to UK government for decision support and automate them to reduced manual timeĀ spent in data gathering.
Stakeholders
- UK Government
- NGOs
- Data & Analytics Team
End Result
An automated dashboard providing quick & latestĀ insights in order to support data driven decision making
Success Criteria
- Dashboard(s) uncovering Covid-19 insights with latest data available
- UK government able to take better decisions & save 10% more lives
Data reference:
Our world in data: https://ourworldindata.org/covid-deaths
Data Cleansing & Transformation (SQL)
To create the necessary data model for doing analysis and fulfilling the business needs defined in the user stories the following tables were extracted using SQL.
The data source (covid) were provided in .csv format by the owid github repo, and were connected in the data model in a later step of the process.
Below are the SQL statements for cleansing and transforming necessary data.
Use Covid;
Select Location, date, total_cases, new_cases, total_deaths, population
From Covid..CovidDeaths
Where continent is not null
order by 1,2
-- Total Cases vs Total Deaths
-- Shows mortality rate from Covid-19 in the United Kingdom
Select Location, date, total_cases,total_deaths, (total_deaths/total_cases)*100 as DeathPercentage
From Covid..CovidDeaths
Where location like '%kingdom%'
and continent is not null
order by 1,2 DESC
-- Total Cases vs Population
-- Shows what percentage of population infected with Covid
Select Location, date, Population, total_cases, (total_cases/population)*100 as PercentPopulationInfected
From Covid..CovidDeaths
Where location like '%kingdom%'
order by 1,2 DESC
-- Countries with Highest Infection Rate compared to Population
Select Location, Population, MAX(total_cases) as HighestInfectionCount, Max((total_cases/population))*100 as PercentPopulationInfected
From Covid..CovidDeaths
Where location like '%kingdom%'
Group by Location, Population
order by PercentPopulationInfected desc
-- Countries with Highest Death Count per Population
Select Location, MAX(cast(Total_deaths as int)) as TotalDeathCount
From Covid..CovidDeaths
--Where location like '%kingdom%'
Where continent is not null
Group by Location
order by TotalDeathCount desc
-- BREAKING THINGS DOWN BY CONTINENT
-- Showing contintents with the highest death count per population
Select continent, MAX(cast(Total_deaths as int)) as TotalDeathCount
From Covid..CovidDeaths
--Where location like '%kingdom%'
Where continent is not null
Group by continent
order by TotalDeathCount desc
-- GLOBAL NUMBERS
Select SUM(new_cases) as total_cases, SUM(cast(new_deaths as int)) as total_deaths, SUM(cast(new_deaths as int))/SUM(New_Cases)*100 as DeathPercentage
From Covid..CovidDeaths
--Where location like '%states%'
where continent is not null
--Group By date
order by 1,2
-- Total Population vs Vaccinations
-- Shows Percentage of Population that has recieved at least one Covid Vaccine
Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(int,vac.new_vaccinations)) OVER (Partition by dea.Location Order by dea.location, dea.Date) as RollingPeopleVaccinated
--, (RollingPeopleVaccinated/population)*100 Can't perform this yet
From Covid..CovidDeaths dea
Join Covid..CovidVaccinations vac
On dea.location = vac.location
and dea.date = vac.date
where dea.continent is not null
order by 2,3
-- Using CTE to perform Calculation on Partition By in previous query
With PopvsVac (Continent, Location, Date, Population, New_Vaccinations, RollingPeopleVaccinated)
as
(
Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(int,vac.new_vaccinations)) OVER (Partition by dea.Location Order by dea.location, dea.Date) as RollingPeopleVaccinated
--, (RollingPeopleVaccinated/population)*100
From Covid..CovidDeaths dea
Join Covid..CovidVaccinations vac
On dea.location = vac.location
and dea.date = vac.date
where dea.continent is not null
--order by 2,3
)
Select *, (RollingPeopleVaccinated/Population)*100
From PopvsVac
-- Using Temp Table to perform Calculation on Partition By in previous query
DROP Table if exists #PercentPopulationVaccinated
Create Table #PercentPopulationVaccinated
(
Continent nvarchar(255),
Location nvarchar(255),
Date datetime,
Population numeric,
New_vaccinations numeric,
RollingPeopleVaccinated numeric
)
Insert into #PercentPopulationVaccinated
Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(int,vac.new_vaccinations)) OVER (Partition by dea.Location Order by dea.location, dea.Date) as RollingPeopleVaccinated
--, (RollingPeopleVaccinated/population)*100
From Covid..CovidDeaths dea
Join Covid..CovidVaccinations vac
On dea.location = vac.location
and dea.date = vac.date
--where dea.continent is not null
--order by 2,3
Select *, (RollingPeopleVaccinated/Population)*100 as PercentPopulationVaccinated
From #PercentPopulationVaccinated
-- Creating View to store data for later visualizations
Create View PercentPopulationVaccinated as
Select dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations
, SUM(CONVERT(int,vac.new_vaccinations)) OVER (Partition by dea.Location Order by dea.location, dea.Date) as RollingPeopleVaccinated
--, (RollingPeopleVaccinated/population)*100
From Covid..CovidDeaths dea
Join Covid..CovidVaccinations vac
On dea.location = vac.location
and dea.date = vac.date
where dea.continent is not null
Data Cleansing & Transformation (Power Query)
The data (covid) from the SQL server was exported to Power BI for further cleaning and transformation.
Below are the Power Query for cleansing and transforming necessary data.
Market Table:
-- Removing blank data rows in market zone column
= Table.SelectRows(sales_markets, each ([zone] <> ""))
Transactions Table:
-- Removing null data rows of -1 and 0 in transactions column
= Table.SelectRows(sales_transactions, each ([sales_amount] <> -1 and [sales_amount] <> 0))
-- Adding new column to convert sales amount in USD to INR
= Table.AddColumn(#"Filtered Rows", "n_sales_amount", each if [currency] = "USD" or [currency] = "USD#(cr)" then [sales_amount]*75 else [sales_amount])
Data Model
Below is a screenshot of the data model after cleansed and prepared tables were read into Power BI.
This data model also shows star schema of the dimension data from the customer, calendar, products and transactions connected to the facts table.
Star Schema
Covid Dashboard
The finished covid dashboard with one page works as a dashboard and overview.