Data integration is a set of procedures, techniques, and technologies used to design and build processes that extract, restructure, move, and load data in either operational or analytic data stores either in real time or in batch mode.
Metadata is the “data” about the data; it is the business and technical definitions that provide the
data meaning.
A major function of data integration is to integrate disparate data into a single view of information.
ETL Data integration
ETL is the collection and aggregation of transactional data with data extracted from multiple sources to be conformed into databases used for reporting and analytics.
Most of the cost and maintenance of complex data integration processing occurs in the bulk data
movement space. ETL has experienced explosive growth in both frequency and size in the past 15 years. In the mid-1990s, pushing 30GB to 40GB of data on a monthly basis was considered a
large effort. However, by the twenty-first century, moving a terabyte of data on a daily basis was a requirement. In addition to standard flat file and relational data formats, data integration environments need to consider XML and unstructured data formats. With these new formats, along with the exponential growth of transactional data, multi-terabyte data integration processing environments are not unusual.