Listed below are the steps needed to create a Data Pipeline. Below the commands are the results from successful processing of those commands.
1) Verify or create required IAM Roles (for CLI or API only).
IAM roles determine what actions pipelines can perform and what resources it can access.
IAM roles determine what actions applications can perform and what resources they can access on a pipeline resource, such as EC2 instance
Through CLI or AWS Data Pipeline console create the following roles: DataPipelineDefaultRole - Grants AWS Data Pipeline access to your AWS resources DataPipelineDefaultResourceRole - Grants applications on EC2 instances access to AWS resources [5]
2) Create a S3 bucket for the data pipeline to use. …show more content…
This pipeline is used to copy input data to an output file.
The first part defines the schedule of when to run the pipeline activity. One can enter a start and end date/time and how often to run it.
The second part defines what type of data node that the input is and what the file path is to locate the input data file. Data node objects can be of type DynamoDBDataNode, MySqlDataNode, RedshiftDataNode, S3DataNode, or SqlDataNode
The third part defines what type of data node that the output is and what the file path is to locate the output file. A data node is a representation of (and is) your business data, such as the path to a data file.
The fourth part defines the EC2 resource (instance) to use. Note that one must reference the IAM roles created previously to use with data pipelines. Also note that the securityGroups and keyPair defined for creating instances must be indicated. If wanted, the "terminateAfter“ command may be used to cause the instance to terminate after the specified time limit.
Last in this file, the activity is defined: what type of activity and what it runs in. [4]
Listed below are the contents of the file.
{
"objects": …show more content…
[root@localhost cit668]# aws datapipeline create-pipeline --name MyPipeline --unique-id token { "pipelineId": "df-0429833V5HEPTPXI8HP" }
6) Define the Pipeline Use the file created previously in step 4. aws datapipeline put-pipeline-definition --pipeline-id df-0429833V5HEPTPXI8HP --pipeline-definition file://createPipeline.json Results will look similar to that below. If there are some errors in the definition file (the .json file), they will be listed here. Some of them are only warnings, but others may be critical.
[root@localhost cit668]# aws datapipeline put-pipeline-definition --pipeline-id df-0429833V5HEPTPXI8HP --pipeline-definition file://createPipeline.json
{
"validationErrors": [], "errored": false, "validationWarnings": [ { "id": "MyCopyActivity", "warnings":