The introduction to Business Analytics consisted of the fundamental processes at the start and end of a data analysis project. The topics discussed were exploratory procedures such as selecting, filtering, and sorting data, formulas for mathematical computation, and making pivot tables for report generation.
Exploratory data analysis or EDA should be conducted as the primary step in a data analysis project. One of the fundamental processes in EDA is to check for the data distribution of each feature column to determine all possible values and how frequently these values appear in the dataset. In my experience, rows are removed when more than 50% of the data are null, given that techniques like imputation are not ideal to employ, such as dealing with categorical columns. Moreover, filtering is also used to understand the dataset further, such as identifying customers with a total transaction of Php 20,000 and above. Lastly, sorting is used to arrange records in ascending to descending order based on a particular column, such as sorting records according to the amount paid by the customer on that day.
After exploring, data is manipulated using formulas to add columns and transform the data to the format needed for the project. For example, the project needs to have a column for total, but this is derived from unit price, quantity, and tax columns. Formulas such as PRODUCT() and SUM() can be applied to derive the total. Other formulas for numerical problems include min(), max(), mean(), and sqrt(). Furthermore, string formulas such as len(), index(), and concatenate() are also employed to explore string data. Logical formulas such as AND(), OR(), and IF() are used to determine the truth value for boolean data.
At the end part of a data analysis project, an aggregate table or pivot table is produced for report generation. Pivot tables show a data summary by dimension, such as by customer, store, and product. Data can include sold units, transactions, distinct products, and revenue. Pivot tables help track performance, monitor customer-product relationships, and make data visualizations.
Reviewing the purpose behind these tools and processes is crucial for any individual in the analytics field because these lay the foundation of the more advanced functions and models flooding the market today. Overlooking the mechanism of these things may cause misuse and tamper the results of your analytics project resulting in unreliable insights. Hence, mastering these essential tools and processes empower a data scientist to work with data and extract insights from it properly.
Posted using Honouree