# Code Structure Tests with Semgrep

[Semgrep ](https://semgrep.dev/)is a static code analysis tool that can check if students have used a specific syntax or coding pattern to reach their solution. AutoTest offers the **Code structure test** block and its corresponding **Positive match** and **Negative match** blocks for running Semgrep tests. These blocks are configured by specifying Semgrep patterns.

To get familiar with Semgrep patterns, we highly recommend the following resources:

* Semgrep's [online documentation](https://semgrep.dev/learn/basics/) has many interactive tutorials and examples.
* Semgrep's [online playground](https://semgrep.dev/playground/new) is helpful for quickly developing and debugging your patterns.

This page is divided into two parts. The first part walks you through creating a Code Structure Test in AutoTest. The second part is a collection of Semgrep patterns that you can use out of the box for your tests.

## Setting up a Code Structure Test in AutoTest

Follow these steps in the AutoTest Editor to create a Code Structure Test:

* Add a **Code Structure Test** Block to your AutoTest configuration.
* In the **Student file** input field, write the name of the file you want to check. The default value will check every file in the Student Directory.
* Add an inner **Positive** or **Negative Match Block** within the **Code Structure Test** Block. The Positive Match will pass if the expected coding pattern is found in the target file, while for the Negative Match, a test passes if the expected coding pattern is not found.
* Edit the template configuration of the Match Block you want to configure.&#x20;

<details>

<summary>More about the Match Block configuration</summary>

The template configuration looks like the following:

```yaml
# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: YOUR_PATTERN
    message: Semgrep found a match
    languages: [THE_PROGRAMMING_LANGUAGE]
    severity: WARNING
```

Note the link to the Semgrep editor in the comment. We recommend developing your Semgrep pattern in this environment and then copy-pasting it back to CodeGrade.

You only need to edit the **pattern** and **languages** field.

* for the pattern, copy and paste your Semgrep rule.
* for the languages, type one of the supported languages within square brackets.

Below is an example of a working configuration to check the presence of a function named **myFunction** in Python:

```yaml
# Use the semgrep playground to create a rule, then copy the result you find
# in the `advanced` tab into this block. https://semgrep.dev/editor
rules:
  - id: untitled_rule
    pattern: |
      def myFunction(...):
        ...
    message: Semgrep found a match
    languages: [python]
    severity: WARNING
```

</details>

{% hint style="info" %}
Wrap a **Hide Configuration Block** around your **Code Structure Test Block** to prevent students from seeing the configuration of your **Match Block**. This can avoid confusion.
{% endhint %}

<div data-full-width="true"><figure><img src="/files/OEcjjxwbQ0TBWzZM9f54" alt=""><figcaption><p>A Code Structure Test in AutoTest. This test is connected to a Rubric Category and its configuration is not visible to the students.</p></figcaption></figure></div>

## Tips and tricks

* **Multiline patterns**: whenever your pattern consists of more than one line, use the vertical bar | to indicate a multiline pattern, like in the following Python example:

```yaml
pattern: |
      def myFunction(...):
        ...
```

* **Metavariables:** metavariables are capital case variables starting with $, like `$N` or `$VAR` . They work as placeholders for actual variables, expressions, or values. They are helpful to write flexible and versatile patterns so that the student is not required to match a piece of code literally.
* **Ellipsis operato**r `...` :  the `...` ellipsis operator abstracts away a sequence of zero or more items such as arguments, statements, parameters, fields, and characters.
* **Literal patterns:** all the parts in your pattern that do not contain metavariables or the `...` operator must be exactly matched by the student's code. Mixing literal and abstract fragments offers a great deal of versatility for devising patterns that can be both as granular and as flexible as desired.

### **Logical operators for rules:** &#x20;

Semgrep allows the creation of composite patterns by applying logical operators on sub-patterns.  In CodeGrade, the most commonly used operators are:

* `pattern`: a simple single pattern (possibly multiline).
* `patterns`: corresponding to the AND logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if and only if all the sub-patterns are matched.

```yaml
patterns:
      - pattern: |
            ...
            db_query(...)
      - pattern: 
            ...
            db_query(..., verify=True, ...)            
```

* **`pattern-either`**: corresponding to the OR logical operator. Other sub-patterns can be nested within it so that the overall pattern will be matched if at least one sub-pattern is matched.

```yaml
pattern-either:
      - pattern: hashlib.sha1(...)
      - pattern: hashlib.md5(...)
```

* `pattern-inside`: The `pattern-inside` operator keeps matched findings that reside within its expression. This is useful for finding code inside other pieces of code, such as functions or blocks.

```yaml
patterns:
      - pattern: return ...
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern-inside: |
          def __init__(...):
              ...
```

## Semgrep Pattern Collection

While our automatic grading guides include examples of Semgrep patterns, we provide a more extensive collection of patterns for the most common programming languages and use-case scenarios here. If you have further questions, don't hesitate to contact us at <support@codegrade.com>!

<details>

<summary>Python</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
    if ...:
        ...
```

* A more elaborate **if-else** example. The first **if** condition is that any variable is greater than 5. The **elif** condition is any. The **else** body contains a literal **print** statement that will be matched only if the student prints the same exact string.

```yaml
pattern: |
      if $X > 5:
        ...
      elif ...:
        ...
      else:
        print("Number is less than 5")
```

### Loops

* A simple **while** loop.  The condition is that a literal variable **x** should be less than any variable. The body can be any.

```yaml
pattern: |
      while x < $VAR:
        ...
```

* A simple **for** loop. The iterator variable can be any. The body of the loop should contain a statement that prints the iterator.

```yaml
pattern: |
        for $X in range($Y):
                ...
                print($X)
                ...
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while ...:
            ...
      - pattern: |
          for ... in ...:
            ...
```

### Functions:

* function definition, with any signature and body:&#x20;

```yaml
pattern: |
      def $FUNC(...):
        ...
```

* function definition where we only care that the first parameter is named **x** and that there is a return statement in the body:

```yaml
pattern: |
        def $F(x, $VAR):
                ...
                return ...
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR)
```

### Import statements

* if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

```yaml
pattern: from sklearn.neighbors import KNeighborsClassifier
```

### Classes

* Check that a class named **Dog** is define&#x64;**:**

```yaml
pattern: |
      class Dog:
        ...
```

* Check the presence of the **init method** defined within an arbitrary class. The method should have the following signature: the **self** keyword, a variable named **x,** and a third arbitrary variable. The method's body should contain a literal assignment for an attribute of the class.

```yaml
patterns:
      - pattern-inside: |
          class $CLASS:
            ...
      - pattern: |
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...
```

&#x20;      Note that also the following pattern would achieve the same result:

```yaml
pattern: |
      class $CLASS:
          ...
          def __init__(self, x, $Y):
            ...
            self.x = x
            ...
          ...
```

</details>

<details>

<summary>Java</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
             ...
```

* A more elaborate **if-else** example. The first **if** condition is that any variable is greater than 5. The **else if** condition is any. The **else** body contains a literal **println** statement that will be matched only if the student prints the same exact string.

```yaml
pattern: |
      if ($X > 5)
        ...
      else if (...)
        ...
      else
        System.out.println("The number is less than 5.");
```

### Loops

* A simple **while** loop.  The condition is that a literal variable **x** should be less than any variable or value. The body can be any.

```yaml
pattern: |
    while (x < $VAR)
       ...
```

* A simple **for** loop. The iterator is the literal variable **i**. The loop condition and update can be any. The body of the loop should contain a statement that prints a string that must be matched exactly.

```yaml
pattern: |
        for (int i = $VAL; ...; ...) {
            ...
            System.out.println("Counting up: " + i);
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while (...)
            ...
      - pattern: |
          for (...; ...; ...)
            ...
```

### Functions:

* function definition, with **public** visibility, return type **String**, and any signature and body:&#x20;

```yaml
pattern: |
      public String $FUNC(...) {
          ...
      }
```

* **public static** function definition that should return a **double**. We check that the first parameter is named **x** and that there is a return statement in the body:

```yaml
pattern: |
        public static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Import statements

* if you want to check that a specific object or module is imported, you can just use a literal pattern, like the following:

```yaml
pattern: import java.util.ArrayList;
```

### Classes

* Check that a class named **Car** is define&#x64;**:**

```yaml
pattern: |
        public class Car{
                ...
        }
```

* Check the presence of specific attributes and methods within the definition of a class named **Car.** We check for:
  * &#x20;the declaration of two literal **String** instance attributes, named **make** and **model;**
  * the declaration of a public Constructor that accepts two **String** parameters.

```yaml
patterns:
      - pattern-inside:
          public class Car{
            ...
          }
      - pattern:  |
          ...
          private String make;
      - pattern:  |
          ...
          private String model;
      - pattern: |
          public Car (String $PARAM1, String $PARAM2)
          {
            ...
          }
```

</details>

<details>

<summary>C++</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if ($COND)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first condition can be any. The **else if** condition is that some variable is less than zero. The **else** body prints a message to the screen and must be matched literally.

```yaml
pattern: |
      if ($COND1) {
        ...
      } else if ($COND2 < 0) {
          ...
      } else {
          std::cout << "The number is zero." << std::endl;
      }
```

### Loops

* &#x20;simple **while** loop.  The condition is that some variable should be less than a second one. The body can be any.

<pre class="language-yaml"><code class="lang-yaml"><strong>pattern: |
</strong>    while($VAR &#x3C; $VALUE)
      {
        ...
      }
</code></pre>

* A simple **for** loop.  The loop initialization, condition, and update can be any. The body of the loop can be any too.

```yaml
pattern: |
        for (...; ...; ...) {
                ...
        }
```

* Check the presence of an if statement within a while loop:

```yaml
patterns:
      - pattern-inside: |
          while (...)
          {
            ...
          }
      - pattern: |
          if($COND)
          {
            ...
          }
```

### Functions:

* definition of a function where we specify the return type and the types of the input parameters. The function name and body can be any, as long as the latter ends with a return statement.

```yaml
pattern: |
      double $FUNC(int $A, int $B)
      {
        ...
        return $VAL;
      }
```

* **static** function definition that should return a **double**. We check that the first parameter is of type **int** and is named **x,** and that there is a return statement at the end of the body:

```yaml
pattern: |
        static double $FUNC(int x, ...) {
          ...
          return $VAL;
        }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Include statements

* if you want to check that a specific library is included, you can use a literal pattern, like the following

```yaml
pattern: |
      #include <iostream>
```

### Classes

Semgrep has not fully supported checks for C++ classes yet. This means that the usual flexibility in designing patterns is not available in this case. Contact us via **<support@codegrade.com>** with a description of the check you want to implement, and we will do our best to devise an effective workaround!

</details>

<details>

<summary>C#</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first **if** condition is that some variable is greater than 0, and its body is any. The **else if** condition is that the same variable as before is less than 0, and its body is any again.  The **else** condition body contains a **Console.WriteLine** statement, which may print anything on the screen.

```yaml
pattern: |
      if ($N > 0)
      {
        ...
      }
      else if($N < 0)
      {
        ...
      }
      else
      {
        Console.WriteLine(...);
      }
```

### Loops

* A simple **while** loop.  The condition is that some variable, represented by **$VAR**, should be less than 10. The body of the loop can be any, as long as it ends with a post-increment of **$VAR .**

```yaml
 pattern: |
     while($VAR<=10)
     {
            ...
            $VAR++;
     }
```

* The most general **for** loop structure

```yaml
pattern: |
        for (...; ...; ...)
        {
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
          while (...)
          {
            ...
          }
      - pattern: |
          for (...; ...; ...)
          {
            ...
          }
```

### Functions:

* definition of a **static** function that should return an integer value.  For the function signature, we specify that it should have two parameters, both of type **int.** The body of the function can be any, but it must terminate with a **return** statement.

```yaml
pattern: |
        static int $F(int $A, int $B)
        {
            ...
            return $VAL
        }
```

* In the following example, we check that a function named **myFunction** has the following properties:
  * it returns a **double**;
  * its first parameter is a **double** variable named **x**. The rest of its signature can be any;
  * it contains an **if** statement, whose condition is any, and whose body must contain a **return** statement.

```yaml
patterns:
        - pattern-inside: |
            double myFunction(double x, ...)
            {
                ...
            }
        - pattern: |
            if(...)
            { 
                ...
                return $VAL
            }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR);
```

### Import statements

* if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

```yaml
pattern: using System;
```

### Classes

* Check that a public class named **Car** is define&#x64;**:**

```yaml
pattern: |
        public class Car{
                ...
        }
```

* Check the presence of specific attributes and methods within the definition of a class named **Car.** We check for:
  * &#x20;the declaration of two literal **string** data fields, named **make** and **model;**
  * the declaration of a public function named **DisplayInfo** with return type **void**.

```yaml
patterns:
      - pattern-inside:
          public class Car
          {
            ...
          }
      - pattern:  |
          ...
          private string make;
      - pattern:  |
          ...
          private string model;
      - pattern: |
            public void DisplayInfo()
            {
                ...
            }
```

</details>

<details>

<summary>R</summary>

### If statements:

* A simple **if** statement whose condition and body can be any:

```yaml
pattern: |
       if (...)
       {
          ...
       }
```

* A more elaborate **if-else** example. The first **if** condition is that some variable is greater than 0, and its body is any. The **else if** condition is that the same variable as before is less than 0, and its body must contain a **print** statement.  The **else** condition body contains a literal **print** statement.

```yaml
pattern: |
        if ($VAR>0) {
            ...
        } else if ($VAR<0) {
            print(...)
        }
        else{
            print("The variable is zero.")
        }
```

### Loops

* A simple **while** loop.  The condition is that some variable, represented by **$VAR**, should be less than 100. The body of the loop can be any, as long as it ends with an increment of the variable represented by $VAR by one.

```yaml
 pattern: |
             while ($VAR < 100) 
             {
                          ...        
                          $VAR <- $VAR + 1
             }
```

* A **for** loop structure where the iterator variable can be any, and its range goes to some number up to 5. The body of the loop can be any.

```yaml
pattern: |
        for ($VAR in $NUM:5) {
            ...
        }
```

* Check the presence of either a **for** or a **while** loop:

```yaml
pattern-either:
      - pattern: |
           while (...)
           {
             ...
           }
      - pattern: |
           for ($VAR in $N:$M) 
           {
                ...
           }   
```

### Functions:

* Definition of a function that must be named **add\_numbers.** It takes any two parameters as input, and its body can be any as long as it ends with a **return** statement.

```yaml
pattern: |
        add_numbers <- function($A, $B) {
            ...
            return($RESULT)
        }  
```

* In the following example, we check that the student defined a function, whose name can be any, with the following properties:
  * its first parameter is a **double** variable named **x**. The rest of its signature can be any;
  * it contains an **if** statement, whose condition is any, and whose body must contain a **return** statement.

```yaml
patterns:
        - pattern-inside: |
            $FUN <- function(x, ...) {
                ...
            } 
        - pattern: |
            if(...)
            {
                ...           
                return $VAL
            }
```

* a function named **myFunction** is called with the first argument equal to **4** and the second argument any:

```yaml
pattern: myFunction(4,$VAR)
```

### Import statements

* if you want to check that a specific library is imported, you can just use a literal pattern, like the following:

```yaml
pattern: library(ggplot2)
```

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.codegrade.com/automatic-grading-guides/code-structure-tests-with-semgrep.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
