Excel is a powerful tool for data cleaning, which involves dealing with "dirty data", data that requires preparation before analysis. Raw data often contains numerous errors that can impact accuracy, leading to incorrect predictions and negative business outcomes.
To enhance the quality of company information, consider the following strategies:
The following example demonstrates the cleanup of a fictitious database using formulas or methods to correct and modify specific requirements.
Identify errors that can impact companies' strategies and the development of new products or services, ultimately leading to inaccurate sales forecasts.
By employing the provided formulas and following the outlined steps in the exercise, it was possible to clean the entire table. This resulted in presenting the data in a more professional and readable format.
Raw database:
Formula | Description |
---|---|
CLEAN() | Removes all non-printable characters from the text. |
TRIM() | Removes all blank spaces from the text except the individual spaces between words. |
LEFT() | Returns the specified number of characters from the beginning of a text string (used to create employee's IDs). |
UPPER() | Converts a text string to uppercase letters. |
PROPER() | Converts a text string to upper or lower case, depending on the case; the first letter of each word is capitalized and the remaining letters are lowercase. |
IF() | Returns a value if a condition is true and other if it is false (used to rename boolean values). |
In Microsoft Excel 2019, the default decimal separator is a comma, not a period. Therefore, if you want to do a mathematical operation, it will return an error.
Steps to remove currency format:
Steps to replace decimal point with a comma:
In this exercise, the start date doesn't specify any type of format and also contains tabulation.
Steps to fix date format:
Steps to identify duplicate values and how to remove them:
The last step of the exercise indicates that empty cells should be removed because they represent null values.
Steps to identify empty cells and how to remove them:
Final result of the cleaning process:
You can visit the GitHub repository for more information on the documents used in the project.