🏗️Code Structure Tests with Semgrep

Learn how to create effective tests to check that your students employ the required coding patterns.

Semgrep is a static code analysis tool that can check if students have used a specific syntax or coding pattern to reach their solution. AutoTest offers the Code structure test block and its corresponding Positive match and Negative match blocks for running Semgrep tests. These blocks are configured by specifying Semgrep patterns.

To get familiar with Semgrep patterns, we highly recommend the following resources:

This page is divided into two parts. The first part walks you through creating a Code Structure Test in AutoTest. The second part is a collection of Semgrep patterns that you can use out of the box for your tests.

Setting up a Code Structure Test in AutoTest

Follow these steps in the AutoTest Editor to create a Code Structure Test:

  • Add a Code Structure Test Block to your AutoTest configuration.

  • In the Student file input field, write the name of the file you want to check. The default value will check every file in the Student Directory.

  • Add an inner Positive or Negative Match Block within the Code Structure Test Block. The Positive Match will pass if the expected coding pattern is found in the target file, while for the Negative Match, a test passes if the expected coding pattern is not found.

  • Edit the template configuration of the Match Block you want to configure.

More about the Match Block configuration

The template configuration looks like the following:

# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: YOUR_PATTERN
    message: Semgrep found a match
    languages: [THE_PROGRAMMING_LANGUAGE]
    severity: WARNING

Note the link to the Semgrep editor in the comment. We recommend developing your Semgrep pattern in this environment and then copy-pasting it back to CodeGrade.

You only need to edit the pattern and languages field.

  • for the pattern, copy and paste your Semgrep rule.

  • for the languages, type one of the supported languages within square brackets.

Below is an example of a working configuration to check the presence of a function named myFunction in Python:

# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: |
      def myFunction(...):
        ...
    message: Semgrep found a match
    languages: [python]
    severity: WARNING

Wrap a Hide Configuration Block around your Code Structure Test Block to prevent students from seeing the configuration of your Match Block. This can avoid confusion.

Tips and tricks

  • Multiline patterns: whenever your pattern consists of more than one line, use the vertical bar | to indicate a multiline pattern, like in the following Python example:

pattern: |
      def myFunction(...):
        ...
  • Metavariables: metavariables are capital case variables starting with $, like $N or $VAR . They work as placeholders for actual variables, expressions, or values. They are helpful to write flexible and versatile patterns so that the student is not required to match a piece of code literally.

  • Ellipsis operator ... : the ... ellipsis operator abstracts away a sequence of zero or more items such as arguments, statements, parameters, fields, and characters.

  • Literal patterns: all the parts in your pattern that do not contain metavariables or the ... operator must be exactly matched by the student's code. Mixing literal and abstract fragments offers a great deal of versatility for devising patterns that can be both as granular and as flexible as desired.

Logical operators for rules:

Semgrep allows the creation of composite patterns by applying logical operators on sub-patterns. In CodeGrade, the most commonly used operators are:

  • pattern: a simple single pattern (possibly multiline).

  • patterns: corresponding to the AND logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if and only if all the sub-patterns are matched.

patterns:
      - pattern: |
            ...
            db_query(...)
      - pattern: 
            ...
            db_query(..., verify=True, ...)            
  • pattern-either: corresponding to the OR logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if at least one sub-pattern is matched.

pattern-either:
      - pattern: hashlib.sha1(...)
      - pattern: hashlib.md5(...)
  • pattern-inside: The pattern-inside operator keeps matched findings that reside within its expression. This is useful for finding code inside other pieces of code, such as functions or blocks.

patterns:
      - pattern: return ...
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern-inside: |
          def __init__(...):
              ...

Semgrep Pattern Collection

While our automatic grading guides include examples of Semgrep patterns, we provide a more extensive collection of patterns for the most common programming languages and use-case scenarios here. If you have further questions, don't hesitate to contact us at support@codegrade.com!

Python

If statements:

  • A simple if statement whose condition and body can be any:

pattern: |
    if ...:
        ...
  • A more elaborate if-else example. The first if condition is that any variable is greater than 5. The elif condition is any. The else body contains a literal print statement that will be matched only if the student prints the same exact string.

pattern: |
      if $X > 5:
        ...
      elif ...:
        ...
      else:
        print("Number is less than 5")

Loops

  • A simple while loop. The condition is that a literal variable x should be less than any variable. The body can be any.

pattern: |
      while x < $VAR:
        ...
  • A simple for loop. The iterator variable can be any. The body of the loop should contain a statement that prints the iterator.

pattern: |
        for $X in range($Y):
                ...
                print($X)
                ...
  • Check the presence of either a for or a while loop:

pattern-either:
      - pattern: |
          while ...:
            ...
      - pattern: |
          for ... in ...:
            ...

Functions:

  • function definition, with any signature and body:

pattern: |
      def $FUNC(...):
        ...
  • function definition where we only care that the first parameter is named x and that there is a return statement in the body:

pattern: |
        def $F(x, $VAR):
                ...
                return ...
  • a function named myFunction is called with the first argument equal to 4 and the second argument any:

pattern: myFunction(4,$VAR)

Import statements

  • if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

pattern: from sklearn.neighbors import KNeighborsClassifier

Classes

  • Check that a class named Dog is defined:

pattern: |
      class Dog:
        ...
  • Check the presence of the init method defined within an arbitrary class. The method should have the following signature: the self keyword, a variable named x, and a third arbitrary variable. The method's body should contain a literal assignment for an attribute of the class.

patterns:
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern: |
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...

Note that also the following pattern would achieve the same result:

pattern: |
      class $CLASS:
          ...
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...
          ...
Java

If statements:

  • A simple if statement whose condition and body can be any:

pattern: |
       if (...)
             ...
  • A more elaborate if-else example. The first if condition is that any variable is greater than 5. The else if condition is any. The else body contains a literal println statement that will be matched only if the student prints the same exact string.

pattern: |
      if ($X > 5)
        ...
      else if (...)
        ...
      else
        System.out.println("The number is less than 5.");

Loops

  • A simple while loop. The condition is that a literal variable x should be less than any variable or value. The body can be any.

pattern: |
    while (x < $VAR)
       ...
  • A simple for loop. The iterator is the literal variable i. The loop condition and update can be any. The body of the loop should contain a statement that prints a string that must be matched exactly.

pattern: |
        for (int i = $VAL; ...; ...) {
            ...
            System.out.println("Counting up: " + i);
            ...
        }
  • Check the presence of either a for or a while loop:

pattern-either:
      - pattern: |
          while (...)
            ...
      - pattern: |
          for (...; ...; ...)
            ...

Functions:

  • function definition, with public visibility, return type String, and any signature and body:

pattern: |
      public String $FUNC(...) {
          ...
      }
  • public static function definition that should return a double. We check that the first parameter is named x and that there is a return statement in the body:

pattern: |
        public static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
  • a function named myFunction is called with the first argument equal to 4 and the second argument any:

pattern: myFunction(4,$VAR);

Import statements

  • if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

pattern: import java.util.ArrayList;

Classes

  • Check that a class named Car is defined:

pattern: |
        public class Car{
                ...
        }
  • Check the presence of specific attributes and methods within the definition of a class named Car. We check for:

    • the declaration of two literal String instance attributes, named make and model;

    • the declaration of a public Constructor that accepts two String parameters.

patterns:
      - pattern-inside:
          public class Car{
            ...
          }
      - pattern:  |
          ...
          private String make;
      - pattern:  |
          ...
          private String model;
      - pattern: |
          public Car (String $PARAM1, String $PARAM2)
          {
            ...
          }
C++

If statements:

  • A simple if statement whose condition and body can be any:

pattern: |
       if ($COND)
       {
          ...
       }
  • A more elaborate if-else example. The first condition can be any. The else if condition is that some variable is less than zero. The else body prints a message to the screen and must be matched literally.

pattern: |
      if ($COND1) {
        ...
      } else if ($COND2 < 0) {
          ...
      } else {
          std::cout << "The number is zero." << std::endl;
      }

Loops

  • simple while loop. The condition is that some variable should be less than a second one. The body can be any.

pattern: |
    while($VAR < $VALUE)
      {
        ...
      }
  • A simple for loop. The loop initialization, condition, and update can be any. The body of the loop can be any too.

pattern: |
        for (...; ...; ...) {
                ...
        }
  • Check the presence of an if statement within a while loop:

patterns:
      - pattern-inside: |
          while (...)
          {
            ...
          }
      - pattern: |
          if($COND)
          {
            ...
          }

Functions:

  • definition of a function where we specify the return type and the types of the input parameters. The function name and body can be any, as long as the latter ends with a return statement.

pattern: |
      double $FUNC(int $A, int $B)
      {
        ...
        return $VAL;
      }
  • static function definition that should return a double. We check that the first parameter is of type int and is named x, and that there is a return statement at the end of the body:

pattern: |
        static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
  • a function named myFunction is called with the first argument equal to 4 and the second argument any:

pattern: myFunction(4,$VAR);

Include statements

  • if you want to check that a specific library is included, you can use a literal pattern, like the following

pattern: |
      #include <iostream>

Classes

Semgrep has not fully supported checks for C++ classes yet. This means that the usual flexibility in designing patterns is not available in this case. Contact us via support@codegrade.com with a description of the check you want to implement, and we will do our best to devise an effective workaround!

C#

If statements:

  • A simple if statement whose condition and body can be any:

pattern: |
       if (...)
       {
          ...
       }
  • A more elaborate if-else example. The first if condition is that some variable is greater than 0, and its body is any. The else if condition is that the same variable as before is less than 0, and its body is any again. The else condition body contains a Console.WriteLine statement, which may print anything on the screen.

pattern: |
      if ($N > 0)
      {
        ...
      }
      else if($N < 0)
      {
        ...
      }
      else
      {
        Console.WriteLine(...);
      }

Loops

  • A simple while loop. The condition is that some variable, represented by $VAR, should be less than 10. The body of the loop can be any, as long as it ends with a post-increment of $VAR .

 pattern: |
     while($VAR<=10)
     {
            ...
            $VAR++;
     }
  • The most general for loop structure

pattern: |
        for (...; ...; ...)
        {
            ...
        }
  • Check the presence of either a for or a while loop:

pattern-either:
      - pattern: |
          while (...)
          {
            ...
          }
      - pattern: |
          for (...; ...; ...)
          {
            ...
          }

Functions:

  • definition of a static function that should return an integer value. For the function signature, we specify that it should have two parameters, both of type int. The body of the function can be any, but it must terminate with a return statement.

pattern: |
        static int $F(int $A, int $B)
        {
            ...
            return $VAL
        }
  • In the following example, we check that a function named myFunction has the following properties:

    • it returns a double;

    • its first parameter is a double variable named x. The rest of its signature can be any;

    • it contains an if statement, whose condition is any, and whose body must contain a return statement.

patterns:
        - pattern-inside: |
            double myFunction(double x, ...)
            {
                ...
            }
        - pattern: |
            if(...)
            { 
                ...
                return $VAL
            }
  • a function named myFunction is called with the first argument equal to 4 and the second argument any:

pattern: myFunction(4,$VAR);

Import statements

  • if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

pattern: using System;

Classes

  • Check that a public class named Car is defined:

pattern: |
        public class Car{
                ...
        }
  • Check the presence of specific attributes and methods within the definition of a class named Car. We check for:

    • the declaration of two literal string data fields, named make and model;

    • the declaration of a public function named DisplayInfo with return type void.

patterns:
      - pattern-inside:
          public class Car
          {
            ...
          }
      - pattern:  |
          ...
          private string make;
      - pattern:  |
          ...
          private string model;
      - pattern: |
            public void DisplayInfo()
            {
                ...
            }
R

If statements:

  • A simple if statement whose condition and body can be any:

pattern: |
       if (...)
       {
          ...
       }
  • A more elaborate if-else example. The first if condition is that some variable is greater than 0, and its body is any. The else if condition is that the same variable as before is less than 0, and its body must contain a print statement. The else condition body contains a literal print statement.

pattern: |
        if ($VAR>0) {
            ...
        } else if ($VAR<0) {
            print(...)
        }
        else{
            print("The variable is zero.")
        }

Loops

  • A simple while loop. The condition is that some variable, represented by $VAR, should be less than 100. The body of the loop can be any, as long as it ends with an increment of the variable represented by $VAR by one.

 pattern: |
             while ($VAR < 100) 
             {
                          ...        
                          $VAR <- $VAR + 1
             }
  • A for loop structure where the iterator variable can be any, and its range goes to some number up to 5. The body of the loop can be any.

pattern: |
        for ($VAR in $NUM:5) {
            ...
        }
  • Check the presence of either a for or a while loop:

pattern-either:
      - pattern: |
           while (...)
           {
             ...
           }
      - pattern: |
           for ($VAR in $N:$M) 
           {
                ...
           }   

Functions:

  • Definition of a function that must be named add_numbers. It takes any two parameters as input, and its body can be any as long as it ends with a return statement.

pattern: |
        add_numbers <- function($A, $B) {
            ...
            return($RESULT)
        }  
  • In the following example, we check that the student defined a function, whose name can be any, with the following properties:

    • its first parameter is a double variable named x. The rest of its signature can be any;

    • it contains an if statement, whose condition is any, and whose body must contain a return statement.

patterns:
        - pattern-inside: |
            $FUN <- function(x, ...) {
                ...
            } 
        - pattern: |
            if(...)
            {
                ...           
                return $VAL
            }
  • a function named myFunction is called with the first argument equal to 4 and the second argument any:

pattern: myFunction(4,$VAR)

Import statements

  • if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

pattern: library(ggplot2)

Last updated