# Code Structure Tests with Semgrep

[Semgrep ](https://semgrep.dev/)is a static code analysis tool that can check if students have used a specific syntax or coding pattern to reach their solution. AutoTest offers the **Code structure test** block and its corresponding **Positive match** and **Negative match** blocks for running Semgrep tests. These blocks are configured by specifying Semgrep patterns.

To get familiar with Semgrep patterns, we highly recommend the following resources:

* Semgrep's [online documentation](https://semgrep.dev/learn/basics/) has many interactive tutorials and examples.
* Semgrep's [online playground](https://semgrep.dev/playground/new) is helpful for quickly developing and debugging your patterns.

This page is divided into two parts. The first part walks you through creating a Code Structure Test in AutoTest. The second part is a collection of Semgrep patterns that you can use out of the box for your tests.

## Setting up a Code Structure Test in AutoTest

Follow these steps in the AutoTest Editor to create a Code Structure Test:

* Add a **Code Structure Test** Block to your AutoTest configuration.
* In the **Student file** input field, write the name of the file you want to check. The default value will check every file in the Student Directory.
* Add an inner **Positive** or **Negative Match Block** within the **Code Structure Test** Block. The Positive Match will pass if the expected coding pattern is found in the target file, while for the Negative Match, a test passes if the expected coding pattern is not found.
* Edit the template configuration of the Match Block you want to configure.&#x20;

<details>

<summary>More about the Match Block configuration</summary>

The template configuration looks like the following:

```yaml
# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: YOUR_PATTERN
    message: Semgrep found a match
    languages: [THE_PROGRAMMING_LANGUAGE]
    severity: WARNING
```

Note the link to the Semgrep editor in the comment. We recommend developing your Semgrep pattern in this environment and then copy-pasting it back to CodeGrade.

You only need to edit the **pattern** and **languages** field.

* for the pattern, copy and paste your Semgrep rule.
* for the languages, type one of the supported languages within square brackets.

Below is an example of a working configuration to check the presence of a function named **myFunction** in Python:

```yaml
# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: |
      def myFunction(...):
        ...
    message: Semgrep found a match
    languages: [python]
    severity: WARNING
```

</details>

{% hint style="info" %}
Wrap a **Hide Configuration Block** around your **Code Structure Test Block** to prevent students from seeing the configuration of your **Match Block**. This can avoid confusion.
{% endhint %}

<div data-full-width="true"><figure><img src="https://2172486256-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MKAQsDlg_P20iQy3JDs%2Fuploads%2FmN8NzcRWUgRU6jxnRaT9%2FScreenshot%202024-07-01%20at%2016.14.30.png?alt=media&#x26;token=710c8e8e-4d07-421c-8a25-4fb6790b7a0c" alt=""><figcaption><p>A Code Structure Test in AutoTest. This test is connected to a Rubric Category and its configuration is not visible to the students.</p></figcaption></figure></div>

## Tips and tricks

* **Multiline patterns**: whenever your pattern consists of more than one line, use the vertical bar | to indicate a multiline pattern, like in the following Python example:

```yaml
pattern: |
      def myFunction(...):
        ...
```

* **Metavariables:** metavariables are capital case variables starting with $, like `$N` or `$VAR` . They work as placeholders for actual variables, expressions, or values. They are helpful to write flexible and versatile patterns so that the student is not required to match a piece of code literally.
* **Ellipsis operato**r `...` :  the `...` ellipsis operator abstracts away a sequence of zero or more items such as arguments, statements, parameters, fields, and characters.
* **Literal patterns:** all the parts in your pattern that do not contain metavariables or the `...` operator must be exactly matched by the student's code. Mixing literal and abstract fragments offers a great deal of versatility for devising patterns that can be both as granular and as flexible as desired.

### **Logical operators for rules:** &#x20;

Semgrep allows the creation of composite patterns by applying logical operators on sub-patterns.  In CodeGrade, the most commonly used operators are:

* `pattern`: a simple single pattern (possibly multiline).
* `patterns`: corresponding to the AND logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if and only if all the sub-patterns are matched.

```yaml
patterns:
      - pattern: |
            ...
            db_query(...)
      - pattern: 
            ...
            db_query(..., verify=True, ...)            
```

* **`pattern-either`**: corresponding to the OR logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if at least one sub-pattern is matched.

```yaml
pattern-either:
      - pattern: hashlib.sha1(...)
      - pattern: hashlib.md5(...)
```

* `pattern-inside`: The `pattern-inside` operator keeps matched findings that reside within its expression. This is useful for finding code inside other pieces of code, such as functions or blocks.

```yaml
patterns:
      - pattern: return ...
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern-inside: |
          def __init__(...):
              ...
```

## Semgrep Pattern Collection

While our automatic grading guides include examples of Semgrep patterns, we provide a more extensive collection of patterns for the most common programming languages and use-case scenarios here. If you have further questions, don't hesitate to contact us at <support@codegrade.com>!

<details>

<summary>Python</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
    if ...:
        ...
```

* A more elaborate **if-else** example. The first **if** condition is that any variable is greater than 5. The **elif** condition is any. The **else** body contains a literal **print** statement that will be matched only if the student prints the same exact string.

```yaml
pattern: |
      if $X > 5:
        ...
      elif ...:
        ...
      else:
        print("Number is less than 5")
```

### Loops

* A simple **while** loop.  The condition is that a literal variable **x** should be less than any variable. The body can be any.

```yaml
pattern: |
      while x < $VAR:
        ...
```

* A simple **for** loop. The iterator variable can be any. The body of the loop should contain a statement that prints the iterator.

```yaml
pattern: |
        for $X in range($Y):
                ...
                print($X)
                ...
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while ...:
            ...
      - pattern: |
          for ... in ...:
            ...
```

### Functions:

* function definition, with any signature and body:&#x20;

```yaml
pattern: |
      def $FUNC(...):
        ...
```

* function definition where we only care that the first parameter is named **x** and that there is a return statement in the body:

```yaml
pattern: |
        def $F(x, $VAR):
                ...
                return ...
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR)
```

### Import statements

* if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

```yaml
pattern: from sklearn.neighbors import KNeighborsClassifier
```

### Classes

* Check that a class named **Dog** is define&#x64;**:**

```yaml
pattern: |
      class Dog:
        ...
```

* Check the presence of the **init method** defined within an arbitrary class. The method should have the following signature: the **self** keyword, a variable named **x,** and a third arbitrary variable. The method's body should contain a literal assignment for an attribute of the class.

```yaml
patterns:
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern: |
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...
```

&#x20;      Note that also the following pattern would achieve the same result:

```yaml
pattern: |
      class $CLASS:
          ...
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...
          ...
```

</details>

<details>

<summary>Java</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
             ...
```

* A more elaborate **if-else** example. The first **if** condition is that any variable is greater than 5. The **else if** condition is any. The **else** body contains a literal **println** statement that will be matched only if the student prints the same exact string.

```yaml
pattern: |
      if ($X > 5)
        ...
      else if (...)
        ...
      else
        System.out.println("The number is less than 5.");
```

### Loops

* A simple **while** loop.  The condition is that a literal variable **x** should be less than any variable or value. The body can be any.

```yaml
pattern: |
    while (x < $VAR)
       ...
```

* A simple **for** loop. The iterator is the literal variable **i**. The loop condition and update can be any. The body of the loop should contain a statement that prints a string that must be matched exactly.

```yaml
pattern: |
        for (int i = $VAL; ...; ...) {
            ...
            System.out.println("Counting up: " + i);
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while (...)
            ...
      - pattern: |
          for (...; ...; ...)
            ...
```

### Functions:

* function definition, with **public** visibility, return type **String**, and any signature and body:&#x20;

```yaml
pattern: |
      public String $FUNC(...) {
          ...
      }
```

* **public static** function definition that should return a **double**. We check that the first parameter is named **x** and that there is a return statement in the body:

```yaml
pattern: |
        public static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Import statements

* if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

```yaml
pattern: import java.util.ArrayList;
```

### Classes

* Check that a class named **Car** is define&#x64;**:**

```yaml
pattern: |
        public class Car{
                ...
        }
```

* Check the presence of specific attributes and methods within the definition of a class named **Car.** We check for:
  * &#x20;the declaration of two literal **String** instance attributes, named **make** and **model;**
  * the declaration of a public Constructor that accepts two **String** parameters.

```yaml
patterns:
      - pattern-inside:
          public class Car{
            ...
          }
      - pattern:  |
          ...
          private String make;
      - pattern:  |
          ...
          private String model;
      - pattern: |
          public Car (String $PARAM1, String $PARAM2)
          {
            ...
          }
```

</details>

<details>

<summary>C++</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if ($COND)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first condition can be any. The **else if** condition is that some variable is less than zero. The **else** body prints a message to the screen and must be matched literally.

```yaml
pattern: |
      if ($COND1) {
        ...
      } else if ($COND2 < 0) {
          ...
      } else {
          std::cout << "The number is zero." << std::endl;
      }
```

### Loops

* &#x20;simple **while** loop.  The condition is that some variable should be less than a second one. The body can be any.

<pre class="language-yaml"><code class="lang-yaml"><strong>pattern: |
</strong>    while($VAR &#x3C; $VALUE)
      {
        ...
      }
</code></pre>

* A simple **for** loop.  The loop initialization, condition, and update can be any. The body of the loop can be any too.

```yaml
pattern: |
        for (...; ...; ...) {
                ...
        }
```

* Check the presence of an if statement within a while loop:

```yaml
patterns:
      - pattern-inside: |
          while (...)
          {
            ...
          }
      - pattern: |
          if($COND)
          {
            ...
          }
```

### Functions:

* definition of a function where we specify the return type and the types of the input parameters. The function name and body can be any, as long as the latter ends with a return statement.

```yaml
pattern: |
      double $FUNC(int $A, int $B)
      {
        ...
        return $VAL;
      }
```

* **static** function definition that should return a **double**. We check that the first parameter is of type **int** and is named **x,** and that there is a return statement at the end of the body:

```yaml
pattern: |
        static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Include statements

* if you want to check that a specific library is included, you can use a literal pattern, like the following

```yaml
pattern: |
      #include <iostream>
```

### Classes

Semgrep has not fully supported checks for C++ classes yet. This means that the usual flexibility in designing patterns is not available in this case. Contact us via **<support@codegrade.com>** with a description of the check you want to implement, and we will do our best to devise an effective workaround!

</details>

<details>

<summary>C#</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first **if** condition is that some variable is greater than 0, and its body is any. The **else if** condition is that the same variable as before is less than 0, and its body is any again.  The **else** condition body contains a **Console.WriteLine** statement, which may print anything on the screen.

```yaml
pattern: |
      if ($N > 0)
      {
        ...
      }
      else if($N < 0)
      {
        ...
      }
      else
      {
        Console.WriteLine(...);
      }
```

### Loops

* A simple **while** loop.  The condition is that some variable, represented by **$VAR**, should be less than 10. The body of the loop can be any, as long as it ends with a post-increment of **$VAR .**

```yaml
 pattern: |
     while($VAR<=10)
     {
            ...
            $VAR++;
     }
```

* The most general **for** loop structure

```yaml
pattern: |
        for (...; ...; ...)
        {
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while (...)
          {
            ...
          }
      - pattern: |
          for (...; ...; ...)
          {
            ...
          }
```

### Functions:

* definition of a **static** function that should return an integer value.  For the function signature, we specify that it should have two parameters, both of type **int.** The body of the function can be any, but it must terminate with a **return** statement.

```yaml
pattern: |
        static int $F(int $A, int $B)
        {
            ...
            return $VAL
        }
```

* In the following example, we check that a function named **myFunction** has the following properties:
  * it returns a **double**;
  * its first parameter is a **double** variable named **x**. The rest of its signature can be any;
  * it contains an **if** statement, whose condition is any, and whose body must contain a **return** statement.

```yaml
patterns:
        - pattern-inside: |
            double myFunction(double x, ...)
            {
                ...
            }
        - pattern: |
            if(...)
            { 
                ...
                return $VAL
            }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Import statements

* if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

```yaml
pattern: using System;
```

### Classes

* Check that a public class named **Car** is define&#x64;**:**

```yaml
pattern: |
        public class Car{
                ...
        }
```

* Check the presence of specific attributes and methods within the definition of a class named **Car.** We check for:
  * &#x20;the declaration of two literal **string** data fields, named **make** and **model;**
  * the declaration of a public function named **DisplayInfo** with return type **void**.

```yaml
patterns:
      - pattern-inside:
          public class Car
          {
            ...
          }
      - pattern:  |
          ...
          private string make;
      - pattern:  |
          ...
          private string model;
      - pattern: |
            public void DisplayInfo()
            {
                ...
            }
```

</details>

<details>

<summary>R</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first **if** condition is that some variable is greater than 0, and its body is any. The **else if** condition is that the same variable as before is less than 0, and its body must contain a **print** statement.  The **else** condition body contains a literal **print** statement.

```yaml
pattern: |
        if ($VAR>0) {
            ...
        } else if ($VAR<0) {
            print(...)
        }
        else{
            print("The variable is zero.")
        }
```

### Loops

* A simple **while** loop.  The condition is that some variable, represented by **$VAR**, should be less than 100. The body of the loop can be any, as long as it ends with an increment of the variable represented by $VAR by one.

```yaml
 pattern: |
             while ($VAR < 100) 
             {
                          ...        
                          $VAR <- $VAR + 1
             }
```

* A **for** loop structure where the iterator variable can be any, and its range goes to some number up to 5. The body of the loop can be any.

```yaml
pattern: |
        for ($VAR in $NUM:5) {
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
           while (...)
           {
             ...
           }
      - pattern: |
           for ($VAR in $N:$M) 
           {
                ...
           }   
```

### Functions:

* Definition of a function that must be named **add\_numbers.** It takes any two parameters as input, and its body can be any as long as it ends with a **return** statement.

```yaml
pattern: |
        add_numbers <- function($A, $B) {
            ...
            return($RESULT)
        }  
```

* In the following example, we check that the student defined a function, whose name can be any, with the following properties:
  * its first parameter is a **double** variable named **x**. The rest of its signature can be any;
  * it contains an **if** statement, whose condition is any, and whose body must contain a **return** statement.

```yaml
patterns:
        - pattern-inside: |
            $FUN <- function(x, ...) {
                ...
            } 
        - pattern: |
            if(...)
            {
                ...           
                return $VAL
            }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR)
```

### Import statements

* if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

```yaml
pattern: library(ggplot2)
```

</details>
