The first step is to pull or dig out data from multiple/heterogeneous desired sources and applications, making it available for further processing. During data extraction, it is not decided as of what actual or relevant data is required by the end-user and thus, more data than needed is extracted and the filtering is performed later. However, some sources like operational systems allow some transformations even during the extraction phase.
ETL(Extract, Transform, Load)
ETL is the core process for building and working with the data warehouse. Right from pulling the data from multiple data sources to storing it to the final data warehouse in the most integrated form, ETL takes care of each and every movement and processing of data from source to destination. This ETL sequence also comprises cleaning of data after extraction.
Next is the transformation step, which implements some rules on the extracted data to transform data as per requirements. In simple terms, it is the application of various data queries and functions on the current DB to fetch a required ordered set of records, with no redundancy.
The last step is loading the cleansed and transformed data to the data warehouse for further analysis. While you are allowed to make alight transformations during the loading if required, it is advised to perform and complete them before the loading process.