Fixing Azure CLI’s Download Limitation and GDPR Concerns

Downloading Application Insights data via the Azure CLI is ideal for starting a data analysis pipeline. However, this method may not scale as the Application Insights database grows in size. Following are two possible issues with this method:

  • There is a limitation of 500K rows or the size of the file (~68MB) (whichever is reached first) when you download Application Insights data via Azure CLI.
  • Production-level App Insights databases that contain user information could be prone to GDPR violations. This happens if the data needs to be downloaded to a local machine when running local tests.

To solve both these issues, Application Insight’s REST API can be used. The REST API calls can be made using a loop where data downloaded can be directly copied to a dataframe in the code, as oppose to downloading locally.For this, a Python code is used to access the REST API and this blog post will explain the process and the code for that.

Example of the Azure CLI download query

The following code is the original query that reached the download limit when downloading 500K+ rows of datasets.

az monitor app-insights query --app $appId --analytics-query "Sample Query" --offset 360d > sample_data.json

Using Application Insights REST API

Link to the documentation

As the solution to the download limitation, the REST API can be used. Following are the steps that need to be followed:

  1. API keys need to be generated and this quick start guide will explain how to do it. After that, the App ID and the API key can be used to access Application Insights via the REST API.
  2. After the API key generation, test the functionality via the API explorer sandbox.
  3. When this is successful, we can write the code to access the REST API with your choice of language.

I have used Python for this task as the rest of the codebase was written in Python.

import json
import os
import requests
import pandas as pd

end_point = ''

def download_data(product_n_data_type):
#set a start time accoridng to your requirement
start_time = "datetime(2021-10-13T00:00:00.001Z)"
download_finished = False
complete_df = {}
end_time = 'now()'
is_first_time_download = True
appId, API_KEY = get_keys()

while not download_finished:
query = get_query(product_n_data_type, start_time, end_time)
params = {"query": query}
headers = {'X-Api-Key': API_KEY, 'Accept':
url = f'{appId}/query'
response = requests.get(url, headers=headers, params=params)
logs = json.loads(response.text)
result = {}

for table in logs['tables']:
result[table['name']] = pd.DataFrame.from_records(table['rows'], columns=[col['name'] for col in table['columns']])
df_final = result['PrimaryResult']

if len(df_final) == 500000:
start_time = 'datetime(' + df_final['timestamp'].max() + ')'
if is_first_time_download:
complete_df = df_final
is_first_time_download = False
complete_df = complete_df.append(df_final)
download_finished = True
if is_first_time_download:
complete_df = df_final
is_first_time_download = False
complete_df = complete_df.append(df_final)

return complete_df

def get_query():
# edit the sample query to fit your need
return "sample query"

def get_keys(product_n_data_type):
# Assign app_Id and API_KEY values to variables or read from the pipeline secrets at run-time
return app_Id, API_KEY

Link to Github gist

The data_download function would return a data frame with the downloaded data. The algorithm is that, if the download result has 500K results, you may get the max date in that data frame and use it as the starting timestamp of the next data download query. Each time the REST API call returns a data dump, the results are appended to a data frame (compelte_df). This REST API call is looped until all the rows are downloaded. It is also important to note that, this doesn’t cover the possible data download size issues. When download size is maxed out it would append an error code in to the resulting data dump. Reading this error message will help to address this problem, However existing code would not address this problem (I would address this in a later blog post).

Let us know your thoughts on this process. Especially if you have a cleaner workaround to this problem.




Datascience (growth) @WSO2 | PhD in Computing

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

About Me — Heather McLeod

Optimizing your CircleCI pipeline time in Android

9 Tech-Related Movies to Watch When You’re Bored at Home

So You Want To Build A Robot

What is Disk storage devices in DBMS ? with proper diagram

Do you want to learn coding? What’s next?

Toy Machine Learning with Haskell

2D Player Attack Animations in Unity

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Akshika Wijesundara

Akshika Wijesundara

Datascience (growth) @WSO2 | PhD in Computing

More from Medium

A Deep Understanding about VM’s , Microsoft Azure and Visual Studio Code

Creating an Azure Function with a Blob Trigger

Creating a Time Triggered Azure Function — SFTP Server Connection

How to — Troubleshooting Azure and Sitecore