Thread
Handling massive data has became a difficult and a key factor in Machine Learning.
But Data Orchestration came to the rescue that helps to deal and manage with huge data and these complex Data Pipelines π
A Thread on Data Orchestration π§΅
But Data Orchestration came to the rescue that helps to deal and manage with huge data and these complex Data Pipelines π
A Thread on Data Orchestration π§΅
Before understanding Data Orchestration it's good to get an idea of Data Pipeline.
Data Pipeline helps in moving data from one location to another.
It can be from one Database to another or to any other destination.
1/8
Data Pipeline helps in moving data from one location to another.
It can be from one Database to another or to any other destination.
1/8
Below is the simple example of a Data Pipeline which is moving the data from a Source to Destination.
This Pipeline includes some data processing steps in between.
2/8
This Pipeline includes some data processing steps in between.
2/8
Now where Data Orchestration fit in?
In the real world business use cases the data will be huge. So it is a tedious task to manage and run all these complex data pipelines.
That's where Orchestration comes to the rescue.
3/8
In the real world business use cases the data will be huge. So it is a tedious task to manage and run all these complex data pipelines.
That's where Orchestration comes to the rescue.
3/8
Data orchestration is the process of coordinating and organizing data from various sources, such as databases and APIs to enable efficient data flow and management.
It also includes Automating, Scheduling, Monitoring these complex data pipelines.
4/8
It also includes Automating, Scheduling, Monitoring these complex data pipelines.
4/8
There are a lot of Data Orchestration Tools available.
Here's one such platform Kestra that offers you all these capabilities and more.
Kestra is an open source orchestration and scheduling platform that helps to create, run, schedule and monitor complex pipelines π
5/8
Here's one such platform Kestra that offers you all these capabilities and more.
Kestra is an open source orchestration and scheduling platform that helps to create, run, schedule and monitor complex pipelines π
5/8
Some Key Features of Kestra.
1. This is Completely Open Source
2. Plugins that helps you to work with different databases
3. Cloud native and scalable
4. You can create complex pipelines
5. Clean UI that allows you to create, run and monitor all your workflows in real time
6/8
1. This is Completely Open Source
2. Plugins that helps you to work with different databases
3. Cloud native and scalable
4. You can create complex pipelines
5. Clean UI that allows you to create, run and monitor all your workflows in real time
6/8
Also I have created a Introductory Video to Data Orchestration.
Make sure to check that as well here:
youtu.be/6g2ihKr8Ztk
Make sure to check that as well here:
youtu.be/6g2ihKr8Ztk
That's it. Hope you enjoyed reading this thread.
Follow @Sumanth_077 if you arenβt following up yet for Python, Data Science, Machine Learning related content/opportunities.
Also, leave a like and retweet the first tweet and help this reach more people :)
Follow @Sumanth_077 if you arenβt following up yet for Python, Data Science, Machine Learning related content/opportunities.
Also, leave a like and retweet the first tweet and help this reach more people :)
Mentions
See All
Shubham Saboo @Saboo_Shubham_
Β·
Dec 28, 2022
Great thread Sumanth!!