通过Kitchen和Pan以命令行方式执行kettle的Job和Transformation

Posted on Posted in 基础知识

简述逻辑:
在windows界面打开kettle 进行脚本编写 ,编写好后把kettle的安装压缩包上传到linux服务器 解压 ,在目录下通过Kitchen.sh执行作业,Pan执行转换

1. 准备工作
一个简单的job,一个简单的trans。

本处为了方便和效果易见,job和trans都生成文件。

trans:读取download目录下的所有文件名,输出为文件。【界面情况下测试成功】

 linux环境以命令行方式执行job和trans

  1.  Pan是用于执行trans的PDI命令行工具。
  2.  Kitchen是用于执行作业的PDI命令行工具。

a. Pan的命令行选项和语法
语法:

        pan.sh -option=value arg1 arg2

命令行参数:

SwitchPurpose
repEnterprise or database repository name, if you are using one
userRepository username
passRepository password
transThe name of the transformation (as it appears in the repository) to launch
dirThe repository directory that contains the transformation, including the leading slash
fileIf you are calling a local KTR file, this is the filename, including the path if it is not in the local directory
levelThe logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfileA local filename to write log output to
listdirLists the directories in the specified repository
listtransLists the transformations in the specified repository directory
listrepLists the available repositories
exprepExports all repository objects to one XML file
norepPrevents Pan from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Pan from logging into the specified repository, assuming you would like to execute a local KTR file instead.
safemodeRuns in safe mode, which enables extra checking
versionShows the version, revision, and build date
paramSet a named parameter in a name=value format. For example: -param:FOO=bar
listparamList information about the defined named parameters in the specified transformation.
maxloglinesThe maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeoutThe maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)

示例:

  1. sh pan.sh -rep=initech_pdi_repo -user=pgibbons -pass=lumburghsux -trans=TPS_reports_2011
  • 1

本地trans调用示例:

./pan.sh -file=/home/hadoop/workplace/kettle/trans/test_cml.ktr -norep
  • 1

b.Kitchen的命令行参数及语法:

语法与Pan一样,参数有点不同。

Switchurpose
repEnterprise or database repository name, if you are using one
userRepository username
pass Repositorypassword
jobThe name of the job (as it appears in the repository) to launch
dirThe repository directory that contains the job, including the leading slash
fileIf you are calling a local KJB file, this is the filename, including the path if it is not in the local directory
levelThe logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing)
logfileA local filename to write log output to
listdirLists the sub-directories within the specified repository directory
listjobLists the jobs in the specified repository directory
listrepLists the available repositories
exportExports all linked resources of the specified job. The argument is the name of a ZIP file.
norepPrevents Kitchen from logging into a repository. If you have set the KETTLE_REPOSITORY, KETTLE_USER, and KETTLE_PASSWORD environment variables, then this option will enable you to prevent Kitchen from logging into the specified repository, assuming you would like to execute a local KTR file instead.
versionShows the version, revision, and build date
paramSet a named parameter in a name=value format. For example: -param:FOO=bar
listparamList information about the defined named parameters in the specified job.
maxloglinesThe maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows (default)
maxlogtimeoutThe maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely (default)

执行本地job的命令行语句:

    /home/kettle/data-integration/kitchen.sh -file=/home/kettle/transition/move.kjb -log=log.log
  • 1

形式:

    $kitchen路径 -file=$job路径 log=$log路径

3.个人常用命令选项

由于我当前的工作环境都是执行本地的job和trans文件,所以常用的命令选项有:

命令描述
-filejob或trans文件路径
-norep标明不是资源库里的文件
-param参数设置
-logfilelog输出文件名
-levellog级别 (Basic, Detailed, Debug, Rowlevel, Error, Nothing)