Ever thought how to submit and manage spark jobs using Powershell ?

Ever thought how to submit and manage spark jobs using Powershell ?

I know that sounds off beat and not a traditional approach for managing spark jobs, but when it comes to doing some thing very quick, I prefer doing it in Powershell. Reason being, the ease of doing things it provides is way better than other traditional languages .

We use LIVY API provided by Spark engine to manage the jobs.

At the end of the day, its scripting and it has to be fast.

So I ventured on a task as below
1. Submit Spark jobs
2. Check the status of Spark jobs
3. Get logs from Spark jobs

Submit Spark Jobs

We use the existing PowerShell cmdlet “Invoke-RestMethod” for this purpose.
Below we have a generic method which takes following prams and submits the job.
1. $user — Spark User
2. $pwd — Sparl user password
3. $sparkUrl — HDInisght Cluster URL
4. $body — This is the spark body passed as a Hashmap variable.

As we can see below, we use a LIVY API end point “/livy/batches” to submit a job. We use Basic Authorization and encoded credentials in the call.

Most importantly, the PowerShell method show below takes $body as the argument. Its just a PowerShell HashMap variable passed and while calling the API we convert the variable to a JSON formatted string using the “ConvertTo-Json” cmdlet

Another important thing to note here is related to $jobId, this is unique number given to job / application on spark level. This number helps us to track the spark job further, after submission.

function Trigger-Spark-Job($body, $user, $pwd, $sparkUrl) {
try {
$server = "$sparkUrl/livy/batches"
$credPair = "$($user'):$($pwd')"
$encodedCredentials = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($credPair))
$headers = @{ Authorization = "Basic $encodedCredentials"
"X-Requested-By"= "admin"
"Content-Type" = "application/json"}
$jobId = Invoke-RestMethod -Method POST -uri $server -Headers $headers -Body (ConvertTo-Json $body)return $jobId.id
} catch {
Write-Error "StatusCode: $($_.Exception.Response.StatusCode.value__)"
Write-Error "StatusDescription: $($_.Exception.Response.StatusDescription)"
exit 1
}

}

Check the status of Spark jobs

We use the same LIVY API, but with a different end point “livy/batches/$jobId/state”

Below we have a generic method which takes following prams and submits the job.
1. $user — Spark User
2. $pwd — Sparl user password
3. $sparkUrl — HDInisght Cluster URL
4. $jobId— This is the unique number we obtained, while submit the job.

We can obtain below possible values as a part of this API all for a job.
“starting”, “not_started”, “running”, “busy”, “idle”, ”dead”, “error”

function Get-Spark-Job-Status($jobId,$user, $pwd, $sparkUrl) {
try {
$server = "$sparkUrl/livy/batches/$jobId/state"
$credPair = "$($user):$($pwd)"
$encodedCredentials = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($credPair))
$headers = @{ Authorization = "Basic $encodedCredentials"
"X-Requested-By"= "admin"
"Content-Type" = "application/json"}
$result = Invoke-RestMethod -Method GET -uri $server -Headers $headers -ContentType "application/json"return $result.state

} catch {
Write-Error "StatusCode: $($_.Exception.Response.StatusCode.value__)"
Write-Error "StatusDescription: $($_.Exception.Response.StatusDescription)"

}
}

Get logs from Spark jobs

We use the same LIVY API, but with a different end point “livy/batches/$jobId/log”

This will return us all the logs for that job. You can trigger this method after the job completed / error.

function Get-Spark-Job-Log($jobId, $user, $pwd, $sparkUrl) {
try {
$server = "$sparkUrl/livy/batches/$jobId/log"
$credPair = "$($user):$($pwd)"
$encodedCredentials = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($credPair))
$headers = @{ Authorization = "Basic $encodedCredentials"
"X-Requested-By"= "admin"
"Content-Type" = "application/json"}
$result = Invoke-RestMethod -Method GET -uri $server -Headers $headers -ContentType "application/json"return $result.log
} catch {
Write-Error "StatusCode: $($_.Exception.Response.StatusCode.value__)"
Write-Error "StatusDescription: $($_.Exception.Response.StatusDescription)"
exit 1
}
}

These are most of the common methods, you will need to manage jobs, but as you see, these can be extended to different use cases, based on different API.
Eg : you can list down the long running jobs and kill them using the API.

Enjoy exploring this further.

Learning every day