T-SQL Tuesday #140: RAPIDS and SQL Server Containers

This month’s T-SQL Tuesday is being hosted by Anthony Nocentino (@nocentino). Thanks again Anthony for this topic.

While this is not exactly Part 2, this blog post does build upon a previous blog post which you can read about here.

What Is RAPIDS? 

From their website, “The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.” 

Today we are going to get two containers running from scratch and have one container run SQL commands against another container hosting SQL Server 2019 Developer Edition.

Let Us First Begin With a Clean Slate and Run RAPIDS

Already have Ubuntu 20.04 (Focal Fosa) installed on an external hard drive (so that my Windows 10 desktop is unaffected), Docker installed, all of the RAPIDS prerequisites, and RAPIDS itself.

Let us start with a clean slate in Docker- no containers. WARNING! The following command will nuke all of your containers. Please do not do this on the wrong machine…

From a terminal window, list your current containers:

sudo docker container list

And then run the nuke command:

sudo docker container prune 

To create the commands below, please visit their Getting Started page at the bottom: https://rapids.ai/start.html 

Note that I have already pulled the latest image so I do not have to run that first command. It took about ten minutes to run.

I then run the second command which will create a new container and put the RAPIDS image in it and then run it:

docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:21.06-cuda11.2-runtime-ubuntu20.04-py3.8

If I open a second terminal window and run [sudo docker container list] again, I will see my running container.

Right click on the link in the output, a Jupyter notebook will launch, and re-name my notebook to 13Jul21:

Running SQL Server 2019 Developer Edition As a Container

Now to get SQL Server 2019 Developer Edition installed as a container. Following the steps here I will now run the following commands in my second terminal window (took less than two minutes on my machine to pull, download, and extract):

sudo docker pull mcr.microsoft.com/mssql/server:2019-latest

Then enter a strong password and run:

sudo docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=solarwinds123_15_my_P@55w0rd" \
-p 1433:1433 --name sql1 -h sql1 \
-d mcr.microsoft.com/mssql/server:2019-latest

Now if you once again re-run [sudo container list] you will see the container running SQL with the name sql1.

Installing Azure Data Studio

Because I like to be able to do SQL things via the GUI, I also like to install Azure Data Studio.

I use the .deb file link for Linux and saved the file; note the version number as you will need it to install. For me it was azuredatastudio-linux-1.30.0.deb

cd ~
sudo dpkg -i ./Downloads/azuredatastudio-linux-1.30.0.deb

Then to launch Azure Data Studio just type azuredatastudio in your terminal. After following the installation steps in the link above and entering the connection information, I was able to successfully connect to my new SQL container using Azure Data Studio. Yay! 

Downloading and Restoring the AdventureWorks2019 Database 

Next let us start by downloading and then restoring the OLTP AdventureWorks2019 database .bak file from here.  

We now need to “get into” the SQL container and perform an operating system task – I want to create a backup folder then copy the .bak from my Downloads folder into it.  

sudo docker exec -it sql1 "bash" 

Then run:

cd /var/opt/mssql/
mkdir backup

Then type exit to get out of bash shell mode. How to copy the .bak from your Downloads folder into the sql1 container backup folder you just created? 

From a terminal cd in your Downloads folder then run

sudo docker cp AdventureWorks2019.bak sql1:/var/opt/mssql/backup/AdventureWorks2019.bak

Back in Azure Data Studio… 

RESTORE FILELISTONLY FROM DISK = ‘/var/opt/mssql/backup/AdventureWorks2019.bak’ 

To verify you can access the .bak file in the sql1 container. 

Then restore the database: 

Connecting From RAPIDS Container to SQL Server Container

Now the fun begins. How to connect and do something from one container to another? For that let us do things from the Jupyter notebook. As you will soon see, we will still need to install some additional software so that the RAPIDS container can connect to the SQL Server container. 

In a cell, run import pyodbc – it will fail.

In the notebook: File: New Launcher and pick Terminal. This will allow us to run operating system commands easier from within our notebook. 

Note that using pip is not recommended within RAPIDS so we use conda (already installed in the RAPIDS container). 

Following the command here: https://anaconda.org/anaconda/pyodbc 

conda install -c anaconda pyodbc 

Then re-run your cell and it should work fine. 

We now need to get the IP address of the sql1 container- use a terminal window and not the one in the notebook. I don’t know why RAPIDS cannot see the server name; a problem to solve for another time:

sudo docker inspect -f ‘{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}’ sql1

Mine was: 172.17.0.3 and then run in a new cell (failure is expected behavior):

 Let’s fix that too using this link as reference for the ODBC Driver 17 for SQL Server. 

Back in the Terminal tab in your notebook:

su
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add –

curl https://packages.microsoft.com/config/ubuntu/20.04/prod.list > /etc/apt/sources.list.d/mssql-release.list

apt-get update

ACCEPT_EULA=Y apt-get install -y msodbcsql17

Then re-run the cell above and it should work fine. 

Finally! Run the following in a new cell in order to load a Pandas dataframe with AdventureWorks data and then move the dataframe from CPU to GPU and show results:

df = pd.read_sql(query, cnxn)

from blazingsql import BlazingContext
bc = BlazingContext()
bc.create_table('POH', df)
gdf = bc.sql('SELECT * FROM POH')
print(gdf)

Conclusion 

What another whirlwind tour! It was a lot, I’ll admit. We were able to pull and then run two docker containers, one RAPIDS and the other SQL Server, install packages and software, do some troubleshooting and show how to run a query from one container to the other. 

Heh, it’s funny how you stumble on things. Just found this recently too: 

https://www.sentryone.com/blog/restoring-an-adventureworks-database-on-a-sql-server-on-linux-container

Hope you have enjoyed this post, thanks for reading!