SQL for Data Analysis

Analysis is the process of finding patterns, trends, and relationships in data sets. Data analysis can shape business processes, improve decision making and drive business growth.

DESCRIPTION


The database used was imported from a CSV file to MySQL Workbench.

Certain statements, keywords, clauses and functions will be used.

GOAL


Answer the questions posed in the exercise to find patterns, trends or obtain relevant information.

RESULT


It was possible to obtain the consulted records requested by the exercise.

Statement Description
CREATE DATABASE Used to create a new SQL database.
CREATE VIEW A view is a virtual table based on the result-set of an SQL statement that contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
Keyword Description
AS Renames a column or table with an alias.
ASC Sorts the result set in ascending order.
DESC Sorts the result set in descending order.
DISTINCT Selects only distinct (different) values.
LIMIT Specifies the number of records to return in the result set.
UNION Combines the result set of two or more SELECT clauses (only distinct values).
USE Uses the named database as the default (current) database for subsequent statements.
Clause Description
SELECT Queries the database according to the criteria set by the operator and returns the rows/columns that match those criteria.
FROM Used to select some records from a table.
GROUP BY Organizes the retrieved data by grouping them by columns.
ORDER BY Sorts the records in the returned result-set.
HAVING Relates to GROUP BY. When this clause is applied, it selects and returns only those rows having the specified conditions TRUE.
Function Description
COUNT() Counts values in the query results and returns the total number of records.

Olympic Games Athltes Database Records

Creating and using the database:

Database

The following query shows a table records of some Olympic Games medalists from 1900 to 2016:

Table


Query output: IDs are related to the athletes' names, so they don't count as Primary Key.

AthletesRecords

The exercise proposes to obtain:

  1. Number of Olympic Games that have been held


  2. Query output: a total of 33 Games have been held (some Summer and Winter editions are held in the same year, then they count as one).


  3. List of all the Olympic Games held so far


  4. Query output: the result shows the list starting from the 1900 Summer Games to the 2016 Summer Games.


  5. Total number of different nations that participated in each Olympic Games


  6. Query output: the result shows the number of participating nations per event.


  7. Games in which the highest and the lowest number of nations participated


  8. Query output: the highest number of nations that participated were 39 at the 2016 Summer Games, while the lowest was 1 at the 1904 Summer Games.


  9. Nation that has participated in most of the Olympic Games
    • The total number of Olympic Games held is 33, as determined by the query:
    • Therefore, the nation that is close to that number of participations must be found. The following query shows the total participations of each nation:
    • A View will be created to see the content of the nations' total participations (this view doesn't belong to the database, it will be used as a subquery).
    • The View was used inside the HAVING clause to get the result.


    • Query output: the nation with the most participations including Summer and Winter editions is Norway with 24.

You can visit the GitHub repository for more information on the documents used in the project.