同步操作将从 科学大数据开源社区/PiFlow 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
πFlow是一个简单易用,功能强大的大数据流水线系统。
简单易用
扩展性强:
性能优越:
功能强大:
install external package
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/spark-xml_2.11-0.4.2.jar -DgroupId=com.databricks -DartifactId=spark-xml_2.11 -Dversion=0.4.2 -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/java_memcached-release_2.6.6.jar -DgroupId=com.memcached -DartifactId=java_memcached-release -Dversion=2.6.6 -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/ojdbc6-11.2.0.3.jar -DgroupId=oracle -DartifactId=ojdbc6 -Dversion=11.2.0.3 -Dpackaging=jar
mvn install:install-file -Dfile=/../piflow/piflow-bundle/lib/edtftpj.jar -DgroupId=ftpClient -DartifactId=edtftp -Dversion=1.0.0 -Dpackaging=jar
mvn clean package -Dmaven.test.skip=true
[INFO] Replacing original artifact with shaded artifact.
[INFO] Reactor Summary:
[INFO]
[INFO] piflow-project ..................................... SUCCESS [ 4.369 s]
[INFO] piflow-core ........................................ SUCCESS [01:23 min]
[INFO] piflow-configure ................................... SUCCESS [ 12.418 s]
[INFO] piflow-bundle ...................................... SUCCESS [02:15 min]
[INFO] piflow-server ...................................... SUCCESS [02:05 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:01 min
[INFO] Finished at: 2020-05-21T15:22:58+08:00
[INFO] Final Memory: 118M/691M
[INFO] ------------------------------------------------------------------------
Intellij上运行PiFlow Server
:
下载 piflow: git clone https://github.com/cas-bigdatalab/piflow.git
将PiFlow导入到Intellij
编辑配置文件config.properties
Build PiFlow jar包:
运行 HttpService:
测试 HttpService:
通过Release版本运行PiFlow
:
根据需求下载最新版本PiFlow:
https://github.com/cas-bigdatalab/piflow/releases/download/v1.0/piflow-server-v1.0.tar.gz
解压piflow-server-v1.0.tar.gz:
tar -zxvf piflow-server-v1.0.tar.gz
编辑配置文件config.properties
运行、停止、重启PiFlow Server
start.sh、stop.sh、 restart.sh、 status.sh
测试 PiFlow Server
vim /etc/profile
export PIFLOW_HOME=/yourPiflowPath
export PATH=$PATH:$PIFLOW_HOME/bin
运行如下命令
piflow flow start example/mockDataFlow.json
piflow flow stop appID
piflow flow info appID
piflow flow log appID
piflow flowGroup start example/mockDataGroup.json
piflow flowGroup stop groupId
piflow flowGroup info groupId
如何配置config.properties
#spark and yarn config
spark.master=yarn
spark.deploy.mode=cluster
#hdfs default file system
fs.defaultFS=hdfs://10.0.86.191:9000
#yarn resourcemanager.hostname
yarn.resourcemanager.hostname=10.0.86.191
#if you want to use hive, set hive metastore uris
#hive.metastore.uris=thrift://10.0.88.71:9083
#show data in log, set 0 if you do not want to show data in logs
data.show=10
#server port
server.port=8002
#h2db port
h2.port=50002
flow json(可查看piflow-bin/example文件夹下的流水线样例)
{
"flow": {
"name": "MockData",
"executorMemory": "1g",
"executorNumber": "1",
"uuid": "8a80d63f720cdd2301723b7461d92600",
"paths": [
{
"inport": "",
"from": "MockData",
"to": "ShowData",
"outport": ""
}
],
"executorCores": "1",
"driverMemory": "1g",
"stops": [
{
"name": "MockData",
"bundle": "cn.piflow.bundle.common.MockData",
"uuid": "8a80d63f720cdd2301723b7461d92604",
"properties": {
"schema": "title:String, author:String, age:Int",
"count": "10"
},
"customizedProperties": {
}
},
{
"name": "ShowData",
"bundle": "cn.piflow.bundle.external.ShowData",
"uuid": "8a80d63f720cdd2301723b7461d92602",
"properties": {
"showNumber": "5"
},
"customizedProperties": {
}
}
]
}
}
CURL方式:
命令行方式:
set PIFLOW_HOME
vim /etc/profile
export PIFLOW_HOME=/yourPiflowPath/piflow-bin
export PATH=$PATH:$PIFLOW_HOME/bin
command example
piflow flow start yourFlow.json
piflow flow stop appID
piflow flow info appID
piflow flow log appID
piflow flowGroup start yourFlowGroup.json
piflow flowGroup stop groupId
piflow flowGroup info groupId
拉取Docker镜像
docker pull registry.cn-hangzhou.aliyuncs.com/cnic_piflow/piflow:v1.1
查看Docker镜像的信息
docker images
通过镜像Id运行一个Container,所有PiFlow服务会自动运行。请注意设置HOST_IP
docker run -h master -itd --env HOST_IP=*.*.*.* --name piflow-v1.1 -p 6001:6001 -p 6002:6002 [imageID]
访问 "HOST_IP:6001", 启动时间可能有些慢,需要等待几分钟
if somethings goes wrong, all the application are in /opt folder
登录
:
流水线列表
:
创建流水线
:
配置流水线
:
运行流水线
:
监控流水线
:
流水线日志
:
流水线组列表
:
配置流水线组
:
监控流水线组
:
运行态流水线列表
:
流水线模板列表
:
数据源
:
调度
:
自定义组件
:
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。