📘
Autograding Jupyter using AutoTest V1
Learn about all the steps to set up a basic Jupyter Notebook assignment
CodeGrade AutoTest runs on Ubuntu (18.04.2 LTS) machines which you can configure in any way that you want and Python comes preinstalled with packages like Numpy.
In the setup section of your AutoTest, you can upload any files you might need for testing, if you are intending on using NBGrader, this is where you will upload your NBGrader file (read more about using NBGrader with CodeGrade). These files are called Fixtures and will be placed in the
$FIXTURES
directory on the Virtual Server.The file structure of the server is like this:
$FIXTURES/
All of your uploaded fixtures will be here.
$STUDENT/
This is where the submission of student is placed.
Tests are executed from this directory.
After uploading any files you might need, you can run some setup to install any packages you might need.
- Global setup script: this is where you install your additional pip packages you want to be available on the server, for instance a package like pandas using
python3 -m pip install pandas
. The global setup only runs once and is then cached, so the student submission won't be available here yet. - Per-student setup script: this will run once when running the autograding for a student and will run before all tests. This is a good place to move
$FIXTURES
to the$STUDENT
directory in case the student solution needs some input files to run correctly.

Use any command in the Global Setup Script field to install software or run your setup script
Now that you have created the setup, it's time to create the actual tests. Do this by pressing the "Add Level" button and then the "Add Category" button.
All tests in AutoTest fill in a specific rubric category that you select. After selecting the category, you can start creating tests that will fill it in. Multiple test types are available in CodeGrade, which can be used together depending on your needs and wishes.
If you want to manually assess Jupyter Notebooks, you can automatically run them in CodeGrade.
For this, we can use CodeGrade’s AutoTest Output functionality. This allows us to generate output using AutoTest, that can be displayed in our Code Viewer! This can be done very easily using the pre-installed
jupyter
package by entering this command in a Run Program step:jupyter nbconvert --execute --to notebook
--output $AT_OUTPUT/jupyter.ipynb $STUDENT/jupyter.ipynb
--allow-errors

The automatically run Jupyter Notebook in the Code Viewer
This will output the notebook back to the Code Viewer under AutoTest output.
You can use IO tests to grade most Jupyter Notebooks.
- 1.Create a Run Program Test
- 2.Add the following line to the program to run:jupyter nbconvert --to script YOURFILE.ipynb
- 3.This will convert the Jupyter Notebook to a Python script, which we can then use in the IO test.
- 4.Set the weight of the Run Program test to 0 if you want to make this step ungraded.
- 1.Create an IO Test.
- 2.As the program to test, use the following command:
python3 -ic "import <your script>"
(do not add the.py
or.ipynb
to the end here).

We can now interact with the notebook. For example we could:
- Print the results of a function like this
print(<your_script>.<your_function>(1, 5))
. - Or we could test a stored variable using
<your_script>.<your_variable>

Autograding variables in a Jupyter Notebook

Autograding functions in a Jupyter Notebook
As the input is regular Python code you are essentially writing to the Python interpreter. You can call functions, do arithmetic operations and print variables, as can be seen in the examples above. You can test numpy arrays and pandas dataframes using I/O tests too.
Importing Python code without printing
One thing to be aware of is that we are checking the stdout of the scripts, which are run completely when importing. As a result, students can clutter the output with additional print statements outside of functions. There’s two ways to prevent this:
- 1.Importing the script with the stdout redirected. This can be done using this little code snippet saved as file
import_without_print.py
and uploaded as a $FIXTURE, which you run viapython3 -i import_without_print.py
:from contextlib import redirect_stdoutfrom os import devnullwith redirect_stdout(open(devnull, 'w')):import jupyter #<-- The name of your script - 2.If you are providing Jupyter Notebook skeleton code to your students, you can make sure to add a
if __name__ == "__main__":
statement every time you print solutions. This way, the students can see their solutions while interacting with the notebook, but these solutions will not be printed when importing the code (using the script above or with a regular import).
You can use PyTest to write unit tests for Jupyter Notebooks like you would with regular Python code. Upload a PyTest unit test file as a
$FIXTURE
and automatically run unit tests on the code. For instance, test student's code with the following unit tests in a file called
test_calculator.py
:import pytest
import calculator
def test_add():
assert calculator.add(3, 2) == 5
assert calculator.add(1, -1) == 0
assert calculator.add(-1, -1) == -2
def test_subtract():
assert calculator.subtract(5, 2) == 3
assert calculator.subtract(1, -1) == 2
assert calculator.subtract(-1, -1) == 0
def test_multiply():
assert calculator.multiply(3, 2) == 6
assert calculator.multiply(1, -1) == -1
assert calculator.multiply(-1, -1) == 1
def test_divide():
assert calculator.divide(10, 2) == 5
assert calculator.divide(1, -1) == -1
assert calculator.divide(-1, -1) == 1
assert calculator.divide(5, 2) == 2.5
with pytest.raises(ValueError):
calculator.divide(10, 0)
- 1.Make sure you have converted the student's notebook to regular Python code, like in the previous section.
- 2.Upload the unit test file as a $FIXTURE in the CodeGrade setup.
- 3.In the category where you want the test to live create a Unit Test under the Run Program Test
- 4.Select Pytest.
- 5.As the File to test give the following file:
$FIXTURES/test_calculator.py

Unit Testing with pytest
CodeGrade integrates a tool called Semgrep to make it easy to perform more complex code analysis by allowing you to write rules in a human readable format. You can provide generic or language specific patterns, which are then found in the code. With its pattern syntax, you can find:
- Equivalences: Matching code that means the same thing even though it looks different.
- Wildcards / ellipsis (...): Matching any statement, expression or variable.
- Metavariables ($X): Matching unknown expressions that you do not yet know what they will exactly look like, but want to be the same variable in multiple parts of your pattern.
Semgrep is pre-installed in the Unit Test. You will write your Semgrep Code Structure tests in a YAML file and upload this as a fixture in the setup section of AutoTest. Semgrep has an online editor that can be used to check and create your patterns:
For instance, this is a ruleset to detect a for-loop in the student code, and make sure they do not use a while-loop.
rules:
- id: for-loop
match-expected: true
pattern: |
for $EL in $LST:
...
message: A for-loop was used
severity: INFO
languages:
- python
- id: no-while-loop
match-expected: false
pattern: |
while $COND:
...
message: No while-loop was used
severity: INFO
languages:
- python
In this file,
rules.yml
, we define two rules: for-loop and no-while-loop. within these rules we define a few things:- Patterns
- The ellipsis (
...
) is used to capture anything - metavariables
$EL
(element) and$LST
(list) capture the two parts of the for-loop declaration (the naming of these metavariables is irrelevant and could have been anything else). - The same is done for the while loop condition
$COND
- Messages We are able to provide understandable messages that are parsed by the wrapper script making our tests understandable for our students.
- Match-expected Importantly, we have added the match-expected field (this is added in the CodeGrade wrapper script and cannot be tested in regular semgrep), with putting this field to
True
for the for-loop rule, we specify that we are expecting a match in order to pass that test. - Severity This dictates the level of severity of failing the rule. The penalty for each severity level can be set in the Unit Test step.
- languages Here we specify the language of the scripts which semgrep will be checking.
- 1.Make sure you have converted the student's notebook to regular Python code, like in the previous sections.
- 2.Upload the YAML (.yml) with your Code Structure tests in it as a $FIXTURE in the CodeGrade setup.
- 3.In the category where you want the test to live create a Unit Test.
- 4.Select semgrep.
- 5.Give the follow extra argument:
$FIXTURES/your-test-file.yml $STUDENT
.
If you run this on your student's code, it will then show up like this:

- Run Program Tests: Run a command and if the command runs successfully this test will pass. Great for custom tests or just checking if the Python file runs.
- Capture Points Tests: Use your own custom grading script, upload it as a
$FIXTURE
, execute it in this test and make sure it outputs a float between 0.0 and 1.0, this score will then be the number of points a student receives. - Checkpoints: Only execute tests below the checkpoint if the tests above the checkpoint reach a certain score. Great for separating simple and advanced tests.
Once you have created a configuration, press start and your AutoTest will start running. In the General Settings, you should have already uploaded a Test Submission, so you will see the results of this straight away.
Once it's started and your assignment is set to Open or Done, students can hand in and get immediate feedback!