Understanding Data Platform Architecture

Understanding Data Platform Architecture

The Architecture Of Data


Data is a critical aspect of every single business. Handling it becomes even more critical. Unless you have set protocols to handle and assimilate your data to be utilized wisely, your business can suffer in the long run. A stringent architecture of your data platform can save you a lot of future hassles.

Today, we try to understand the basic setup of such data platforms.


Data Platform Architecture - Basics



The main components of a data management platform are as below:


The Data Collection Layer

The data collection layer is divided into 2 parts:

Client-side – the part is responsible for collecting the data and sending it to the server-side data collector. There are a number of ways this could be done, for example with a JavaScript tracker, an SDK, or other libraries.

A JavaScript tracker and impression pixel may also set off piggyback pixels to sync cookies with third-party platforms.

Server-side – provides the endpoints responsible for:

  • Receiving the data from the client-side libraries – typically, very lightweight and just used for logging the data or pushing them to the queue(s) for the next layer to process.
  • Syncing cookies with third-party platforms and building cookie matching tables that are used later during the audience export stage (see below).


The Data Normalization and Enrichment Layer

Once the data has been captured from the data collection endpoint, the DMP normalizes and enriches the data.

The data normalization and enrichment process can include a number of the following actions:

  • Deleting redundant or useless data.
  • Transforming the source’s data schema to the DMP’s data schema.
  • Enriching the data with additional data points, such as geolocation and OS/browser attributes.


The Data Storage, Merging, and Profile Building Layer

The next step is to store and merge the newly collected data with existing data and create user profiles.

Profile building is an essential part of the whole data-collection process, as it is responsible for transforming the collected data into events and profiles, which are the cornerstones of audience segmentation (the next stage).

A user profile could contain several identifiers, such as cookies or device identifiers, as well as persistent identifiers that are pseudo-anonymized – e.g. hashed usernames or email addresses.

Another important part of the profile-building stage is the matching of data sets using common identifiers — e.g. matching an email address from a CRM system with an email address from a marketing-automation platform.

A profile consists of user attributes (e.g. home location, age group, gender, etc.) as well as events (e.g. page view, form filled in, transaction, etc.). The latter is typically a separate collection or table in the database.


The Data Analysis and Segmentation Layer

The core functionality of a DMP is analyzing the data and creating segments (e.g. audiences).

An audience segment is useful to advertisers and marketers (and publishers) because it allows them to cut through the mass of data available to them and break it down into digestible pieces of information about potential customers, site visitors or app users.

With good audience segmentation, advertisers can buy display ads targeted at a group of Internet users and publishers can analyze site visitors and then sell inventory at a higher price to media buyers whose target segments match the publisher’s.


Audience Export

Audience export is a component that periodically exports segments to third-party platforms, for example demand-side platforms (DSPs), in order to allow advertisers to use them in campaign targeting.


User Interface

This is pretty self-explanatory – you need to give the users a UI to create segments, configure data sources, analyze and visualize the data, as well as provide them with the ability to configure the audience exports to third-party platforms.


Application Programming Interfaces (APIs)

APIs can be divided into the following categories:

  • Platform API used to create, modify, and delete objects such as users, segments etc. – basically for whatever task the user is able to do via the UI in the platform.
  • Reporting API used to run reports on the data. Due to the sheer amount of data, some of the reports may need to be scheduled for offline processing and made available for download once generated.
  • Audience API that allows client libraries to query in real-time whether a given visitor belongs to the audience or not.
  • Data ingestion API used for importing the segments or other data from third-party platforms. Again, as the data volume may be large, this can happen through an Amazon S3 bucket or file upload that is queued by your DMP for offline processing.


This, of course this a simplified example and the actual components and architecture may get more complex as you add additional features and integrations.


Need Help? Chat with us