fix(sonarqube): mdDesc fallback by samiat4911 · Pull Request #14770 · DefectDojo/django-DefectDojo

samiat4911 · 2026-04-28T10:37:28Z

Description

This PR fixes SonarQube API imports when a rule no longer returns htmlDesc but does return mdDesc, which is the behavior reported in issue #12565.

Before this change, the regular SonarQube importer only populated description, cwe, and rule references when htmlDesc was present. When htmlDesc was missing, the importer fell back to an empty description and None for CWE, even when SonarQube had provided a Markdown rule description in mdDesc.

I fixed this by making the importer resolve rule content from either htmlDesc or mdDesc, converting Markdown to HTML when needed, and then reusing the existing description/CWE/reference extraction flow.

The importer change is:

                rule_details = self.get_rule_details(rule)
                if rule_details:
                    description = self.clean_rule_description_html(
                        rule_details,
                    )
                    cwe = self.clean_cwe(rule_details)
                    references = sonarqube_permalink + self.get_references(rule_details)
                else:
                    description = ""
                    cwe = None
                    references = sonarqube_permalink

The new helper logic and the small hardening around description/reference parsing are:

    @staticmethod
    def clean_rule_description_html(raw_html):
        if not raw_html:
            return ""
        search = re.search(
            r"^(.*?)(?:(<h2>See</h2>)|(<h2>References</h2>)|(<b>References</b>))",
            raw_html,
            re.DOTALL,
        )
        if search:
            raw_html = search.group(1)
        h = html2text.HTML2Text()
        raw_html = raw_html.replace("<h2>", "<b>").replace("</h2>", "</b>")
        return h.handle(raw_html)

    @staticmethod
    def clean_cwe(raw_html):
        search = re.search(r"CWE-(\d+)", raw_html)
        if search:
            return int(search.group(1))
        return None

    @staticmethod
    def get_rule_details(rule):
        if html_desc := rule.get("htmlDesc"):
            return html_desc
        if not (md_desc := rule.get("mdDesc")):
            return ""
        if SonarQubeApiImporter.is_html_description(md_desc):
            return md_desc
        # SonarQube 2025.x can return markdown-only rule descriptions.
        return markdown.markdown(md_desc, extensions=["extra"])

    @staticmethod
    def is_html_description(description):
        return bool(re.search(r"<[a-zA-Z][^>]*>", description))

    @staticmethod
    def get_references(vuln_details):
        if not vuln_details:
            return ""
        parser = etree.HTMLParser()
        details = etree.fromstring(vuln_details, parser)

        rule_references = ""
        if details is not None:
            for a in details.iter("a"):
                rule_references += f"[{a.text}]({a.get('href')})\n"
        return rule_references

I also added a regression fixture that represents a rule with only mdDesc:

{
    "key": "typescript:S1854",
    "repo": "typescript",
    "name": "Dead stores should be removed",
    "createdAt": "2018-01-17T10:11:21-0500",
    "mdDesc": "A dead store happens when a local variable is assigned a value that is not read by any subsequent instruction. Calculating or retrieving a value only to then overwrite it or throw it away, could indicate a serious error in the code. Even if it's not an error, it is at best a waste of resources. Therefore all calculated values should be used.\n\n## Noncompliant Code Example\n\ni = a + b; // Noncompliant; calculation result not used before value is overwritten\ni = compute();\n\n## Compliant Solution\n\ni = a + b;\ni += compute();\n\n## Exceptions\n\nThis rule ignores initializations to -1, 0, 1, `null`, `true`, `false`, `\"\"`, `[]` and `{}`.\n\n## See\n\n- [MITRE, CWE-563](http://cwe.mitre.org/data/definitions/563.html) - Assignment to Variable without Use ('Unused Variable')\n- [CERT, MSC13-C.](https://www.securecoding.cert.org/confluence/x/QYA5) - Detect and remove unused values\n- [CERT, MSC56-J.](https://www.securecoding.cert.org/confluence/x/uQCSBg) - Detect and remove superfluous code and values\n",
    "severity": "MAJOR",
    "status": "READY",
    "isTemplate": false,
    "tags": [],
    "sysTags": [
        "cert",
        "cwe",
        "unused"
    ],
    "lang": "ts",
    "langName": "TypeScript",
    "params": [],
    "defaultDebtRemFnType": "CONSTANT_ISSUE",
    "defaultDebtRemFnOffset": "15min",
    "debtOverloaded": false,
    "debtRemFnType": "CONSTANT_ISSUE",
    "debtRemFnOffset": "15min",
    "defaultRemFnType": "CONSTANT_ISSUE",
    "defaultRemFnBaseEffort": "15min",
    "remFnType": "CONSTANT_ISSUE",
    "remFnBaseEffort": "15min",
    "remFnOverloaded": false,
    "scope": "MAIN",
    "isExternal": false,
    "type": "CODE_SMELL"
}

The regression test I added is:

def dummy_rule_md_desc_only(self, *args, **kwargs):
    with (get_unit_tests_scans_path("api_sonarqube") / "rule_md_desc_only.json").open(encoding="utf-8") as json_file:
        return json.load(json_file)


class TestSonarqubeImporterMarkdownRuleDescription(DojoTestCase):
    fixtures = [
        "unit_sonarqube_toolType.json",
        "unit_sonarqube_toolConfig1.json",
        "unit_sonarqube_toolConfig2.json",
        "unit_sonarqube_product.json",
        "unit_sonarqube_sqcNoKey.json",
        "unit_sonarqube_sqcWithKey.json",
    ]

    def setUp(self):
        product = Product.objects.get(name="product")
        engagement = Engagement(product=product)
        self.test = Test(
            engagement=engagement,
            api_scan_configuration=Product_API_Scan_Configuration.objects.all().last(),
        )

    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_project", dummy_product)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_rule", dummy_rule_md_desc_only)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.find_issues", dummy_issues)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_hotspot_rule", dummy_hotspot_rule)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.find_hotspots", empty_list)
    def test_parser(self):
        parser = SonarQubeApiImporter()
        findings = parser.get_findings(None, self.test)
        self.assertEqual(2, len(findings))
        finding = findings[0]
        self.assertEqual(563, finding.cwe)
        self.assertIn(
            "A dead store happens when a local variable is assigned a value",
            finding.description,
        )
        self.assertIn(
            "[MITRE, CWE-563](http://cwe.mitre.org/data/definitions/563.html)",
            finding.references,
        )

Test results
I tested this change with the SonarQube importer unit tests in the project’s Docker/Postgres-backed test setup.

Focused regression test:

docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml run --rm --no-deps --entrypoint bash uwsgi -lc "export HOME=/tmp PYTHONUSERBASE=/tmp/pyuser PATH=/tmp/pyuser/bin:/usr/local/bin:/usr/bin:/bin; python3 -m pip install --user -r requirements-dev.txt && python3 manage.py test unittests.tools.test_api_sonarqube_importer.TestSonarqubeImporterMarkdownRuleDescription -v2 --keepdb"

Full SonarQube importer test file:

docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml run --rm --no-deps --entrypoint bash uwsgi -lc "export HOME=/tmp PYTHONUSERBASE=/tmp/pyuser PATH=/tmp/pyuser/bin:/usr/local/bin:/usr/bin:/bin; python3 -m pip install --user -r requirements-dev.txt >/tmp/requirements-dev-install.log && python3 manage.py test unittests.tools.test_api_sonarqube_importer -v1 --keepdb"

Results:

The focused markdown regression test passed.
The full unittests.tools.test_api_sonarqube_importer suite passed with 17 tests.

Checklist

This checklist is for your information.

dryrunsecurity · 2026-04-28T10:42:04Z

This pull request introduces code that generates and returns HTML from markdown (via markdown.markdown(..., extensions=["extra"])) without sanitization, and returns htmlDesc verbatim from get_rule_details, creating a high-severity potential Cross-Site Scripting (XSS) risk if user-controlled content reaches a rendering sink. The finding recommends sanitizing or escaping rendered HTML before use to prevent XSS.

🟠 Potential Cross-Site Scripting in dojo/tools/api_sonarqube/importer.py (drs_21be8a9f)

Vulnerability	Potential Cross-Site Scripting
Description	The code calls markdown.markdown(md_desc, extensions=["extra"]) and also returns htmlDesc verbatim from get_rule_details; these can produce/contain raw HTML that is later parsed and used without any sanitization or escaping. If user-controlled markdown/html reaches a rendering sink, this can enable XSS.

django-DefectDojo/dojo/tools/api_sonarqube/importer.py

Lines 371 to 374 in 981a86b

    
               return markdown.markdown(md_desc, extensions=["extra"]) 
        
           @staticmethod 
        
           def is_html_description(description):

Comment to provide feedback on these findings.

Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]

Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing

All finding details can be found in the DryRun Security Dashboard.

fix(sonarqube): mdDesc fallback

981a86b

samiat4911 requested review from Maffooch and mtesauro as code owners April 28, 2026 10:37

github-actions Bot added unittests parser labels Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sonarqube): mdDesc fallback#14770

fix(sonarqube): mdDesc fallback#14770
samiat4911 wants to merge 1 commit intoDefectDojo:masterfrom
samiat4911:fix/sonarqube-md-desc-fallback

samiat4911 commented Apr 28, 2026 •

edited

Loading

Uh oh!

dryrunsecurity Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samiat4911 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dryrunsecurity Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samiat4911 commented Apr 28, 2026 •

edited

Loading