How to integrate Amazon Redshift with MadKudu
If your data is hosted in Amazon Redshift, you will need to
export the data to an S3 bucket with this format.
grant MadKudu access to the bucket following these instructions.
and we'll be able to start pulling your data.
Note: If you're looking to export data for a Segment Warehouse in Redshift, please follow these instructions instead.
Exporting your Redshift data to S3
With Redshift, you can UNLOAD data directly to a S3 bucket. For the file format, please follow the same spec as our flat file import.
Event
Once you've mapped the data into the expected format, you can export it to S3 with a command like this
UNLOAD ("SELECT event_text, event_timestamp, contact_key, event_attribute_1, event_attribute_2 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/event/event.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
HEADER
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
Where:
SELECT
should be modified to match your column headers namesCREDENTIALS
→ will be provided by MadKudu in the app > integration > S3 (if using our S3 bucket, otherwise, enter the credentials of your bucket)
Please use EXACTLY the same following commands as it tends to create less file corruption issues:
HEADER
-> include the headerDELIMITER '~'
→ delimits values with a ~(abc~efg~...)ADDQUOTES
→ add quotes around values (abc → "abc")ESCAPE
→ escape special characters with quotesMANIFEST
→ reads multiple files in parallelGZIP
→ compress resultALLOWOVERWRITE
: replace file with new one instead of keeping all filesRecommended when
files are big
customer sending all the table everyday with the increments (versus just the increments)
Not recommended (have a pattern, e.g. the date) for the name of the files)
If want to keep historic, especially when customer only sends increments (if the pull fails at a moment, then the missed file is not lost)
PARELLEL ON
: write multiple files at a time to go faster
NOTE: with these settings, every time you execute this command the existing files will be overwritten.
Contact
UNLOAD ("SELECT contact_key, email, attribute_1, attribute_2 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/identify/identify.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;
Account
UNLOAD ("SELECT contact_key, account_key, account_attribute_1, account_attribute_1 FROM [your_table]")
TO 's3://madkudu-data-in-[your_org_id]/group/group.csv'
CREDENTIALS 'aws_access_key_id=[your_access_key];aws_secret_access_key=[your_secret_key]'
DELIMITER '~'
ADDQUOTES
ESCAPE
MANIFEST
GZIP
ALLOWOVERWRITE
PARALLEL ON
;