Amazon S3 - How to format and send data to MadKudu via a bucket

You'd like to use product usage or CRM data from a source MadKudu does not currently have an integration with? No worries, we can easily set up a transfer using Amazon S3 from Amazon Redshift or flat files (JSON or CSV). MadKudu's preferred way is to pull data from your S3 bucket where the data is formatted as described below, and from which MadKudu has access through an IAM role.

Note:

Depending on the volume of data to transfer, it may take from a few hours to a few weeks (>500M records) given the transfer rate limit between Amazon S3 to MadKudu rate limit. We recommend only sending the events necessary to configure MadKudu otherwise your implementation will be delayed until we get the full history of data. Please refer to this documentation or consult our implementation team at success@madkudu.com to understand what are relevant versus irrelevant events to configure your scoring.

Pre-requisites

  • You have access to an AWS account to create/manage an S3 bucket

How to format your data

MadKudu works with 3 types of objects:

  • Event: what are users doing?

  • Contact: who is the user? (coming soon)

  • Account: what accounts my users belong to? (coming soon)

Person level events

To send behavioral data (product usage, web activity, marketing activity...), create a file named event with the following attributes (with headers included):

Attribute

Format

Example

Description

event_key

required

String

"abc123"

A unique key identifying the event. If you do not have one, we suggest creating a combination of event_text + contact_key + event_timestamp

event_text

required

String

“signup”, “login”, “invited a friend”

The action taken by the user.

event_timestamp

required

Unix time

“1436172703”

The time at which the event happened

contact_key

required

String

"paul@madkudu.com"

The email address of the user who performed the action

event_*

optional

String or Numeric

properties describing the event (e.g. event_url for the url of visited page, event_form_title for the title of form submitted...)

Example in JSON format

{"event_key": "abcd1234", "event_text":"signed up", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com"}
{"event_key": "abcd2345", "event_text":"visit web page", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "event_url":"http://www.domain.com/pricing"}

If you plan on sending event data from 2 or more sources, both event streams should be in the same file.

If you plan to have MadKudu pull your S3 data on a recurring basis, all custom properties columns (event_*) must be communicated prior to setting up the recurring pull.

If you send events, please note that MadKudu needs to receive individual events, not aggregations.

Meaning MadKudu needs to receive this:

Event key

Event text

Event timestamps

Email

100

Email click

1/6/2023 0:00:00

john@madkudu.com

101

Email click

1/6/2023 0:00:00

john@madkudu.com

103

Email click

1/8/2023 0:00:00

john@madkudu.com

104

Email click

1/8/2023 0:00:00

john@madkudu.com

105

Email click

1/8/2023 0:00:00

john@madkudu.com

Instead of this:

Event key

Event text

Event timestamps

Email

Email

100

Number of email clicks

1/6/2023 0:00:00

john@madkudu.com

2

101

Number of email clicks

1/8/2023 0:00:00

john@madkudu.com

3

Account level events

If you are sending account level events (de-anonymized website visits, 3rd part intent, etc) the same events format applies. To attach events to the respective account, MadKudu uses the domain. The events file needs to contain a ‘fake’ email address anonymous@domain.com as contact_key. See details here.

Attribute

Format

Example

Description

event_key

required

String

"abc123"

A unique key identifying the event. If you do not have one, we suggest creating a combination of event_text + contact_key + event_timestamp

event_text

required

String

“signup”, “login”, “invited a friend”

The action taken by the user.

event_timestamp

required

Unix time

“1436172703”

The time at which the event happened

contact_key

required

String

"anonymous@madkudu.com"

The unique identifier of the visitor who showed intent.

to create an email, you can append 'anonymous@' in front of each domain.

event_*

optional

String or Numeric

properties describing the event (e.g. event_url for the url of visited page, event_form_title for the title of form submitted...)

Example in JSON format

{"event_key": "abcd1234", "event_text":"signed up", "event_timestamp":1234567890, "contact_key":"anonymous@madkudu.com"}
{"event_key": "abcd2345", "event_text":"visit web page", "event_timestamp":1234567890, "contact_key":"anonymous@madkudu.com", "event_url":"http://www.domain.com/pricing"}

Points of attention

All files should have a header. The bracket { } and single quote ' characters are not supported. Make sure to delete any of these before creating your files.

How to format the files

MadKudu currently supports two file formats:

  • Newline-delimited JSON (preferred)

  • CSV

Newline-delimited JSON

Our preferred format for upload is newline-delimited JSON, which is more standardized and less error-prone than CSV.

In this format, the different records are separated by the newline \n character. Each line is a valid JSON object:

{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com"}
{"event_text":"added a friend", "event_timestamp":1234567890, "contact_key":"paul@madkudu.com", "some_other_event_field":"some_value"}

Escape any double quote "  in your data with a \ (e.g. replace "  with \") Incorrect

{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val"ue"}

Correct

{"event_text":"signed up", "event_timestamp":1234567890, "contact_key":"abc1234", "key": "val\"ue"}

CSV

We also support the .csv format, with the recommended format:

  • column names (header) in the first line

  • separator: ~ → separate the value with ~ (ex: abc~def~) Please do not use ,  or -as it easily creates parsing issues

  • delimiter: "  → this adds quotes around the values (abc -> "abc")

  • line separator: line-break \n

Points of attention

  • Delimit your values with " "

  • Remove all line break characters (for example \n) from your fields.

  • Make sure the number of fields is the same for each line.

  • Escape your " characters by adding a second " character in front of it (see here for details)

Incorrect

Values are not delimited by "

abc,cde,ef

Correct

"abc","cde","efg"

Incorrect

The "e is wrongfully formatted. A second " should be added before.

"abc","cd"e","efg"

Correct

"abc","cd""e","efg"

Using the UTF-8 encoding is useful to avoid any issues with special characters in the files.

Data validation

JSON line and CSV are relatively easy to corrupt (for example with " or , characters in the data).

We will validate the data on our side and warn you of any corruption issues, but it helps a lot if you follow the format requested above.

Compression

Please note that the maximum size for a single JSON object is 4 MB.

To speed up the data upload part, we highly recommend that you compress your file with GZIP before uploading them to S3.

You can call your file whatever you want it (we recommend event, contact and account). However, please make sure to add the correct extension depending on your file format:

  • .json.gz for compressed JSON (recommended)

  • .json for uncompressed JSON

  • .csv.gz for compressed CSV

  • .csv for uncompressed CSV

Whichever format you choose, if you plan on having MadKudu pull your S3 data on a recurring basis, the file format has to remain the same.

How to store your file

We recommend that the files you want to share with MadKudu are in a dedicated folder and that you create an IAM policy and role for MadKudu to access these files.

You will also need to set up a recurring push of your data to this folder for MadKudu to score fresh data. This is done by creating distinct files, as described below.

File naming

In the S3 bucket, please upload data into separate folders by date and by objects

{object}/{year}/{month}/{day} where the objects are

  • event

  • contact

  • account

  • opportunity

MadKudu will pull the files on the date from the folder name. Files in a folder containing /2020/11/20/ will be pulled on November the 20th, 2020.

If you use the S3 API, simply “prefix” your destination file name. For example, uploading to "contact/2020/11/20/name_of_file.csv" will add a file name name_of_file.csv to the contact folder.

Please use this recommended file naming and storing system in the bucket for MadKudu to be able to automatically pull any new file.

s3://bucket_name/object/year/month/day/name_of_file.csv

Compression

To speed up file transfer, you can compress files locally before transferring them to Amazon S3. If you want to compress your files, please use the GZIP compression method and use .gz or .gzip as your file extension (we currently don’t support other methods or other extensions).

Frequency: setting up a recurring push of data to MadKudu

We pull from your S3 once a day at 00:01am (midnight) UTC. Therefore we recommend you load new files before, like an hour before at 11pm UTC.

When uploading new files please use the recommended naming convention described here File naming.

If you plan on having MadKudu pull your S3 data on a recurring basis, the file folder and the file naming have to remain the same.


FAQ

I'm having an issue with S3 / I don't know how to use S3

Please open a ticket here and we will be happy to assist you.

Your file format doesn’t work for me. What do I do?

If you’re having any issues with the file format, please open a ticket here and we’ll be happy to help.

How often is the data refreshed?

As soon as you drop data into the S3 bucket, expect results to be updated in the Madkudu platform within 6 hours.

What would happen if I send the same event more than once - will it appear twice in MadKudu?

Our system will deduce the events based on contact_key / event_text / timestamp. If you send the same event twice, only one will be kept:

  • If sent in two separate batches, only the most recent will be kept.

  • If sent in the same data batch, the first one in the file.