Skip to content

fix(sonarqube): mdDesc fallback#14770

Open
samiat4911 wants to merge 1 commit intoDefectDojo:masterfrom
samiat4911:fix/sonarqube-md-desc-fallback
Open

fix(sonarqube): mdDesc fallback#14770
samiat4911 wants to merge 1 commit intoDefectDojo:masterfrom
samiat4911:fix/sonarqube-md-desc-fallback

Conversation

@samiat4911
Copy link
Copy Markdown
Contributor

@samiat4911 samiat4911 commented Apr 28, 2026

Description

This PR fixes SonarQube API imports when a rule no longer returns htmlDesc but does return mdDesc, which is the behavior reported in issue #12565.

Before this change, the regular SonarQube importer only populated description, cwe, and rule references when htmlDesc was present. When htmlDesc was missing, the importer fell back to an empty description and None for CWE, even when SonarQube had provided a Markdown rule description in mdDesc.

I fixed this by making the importer resolve rule content from either htmlDesc or mdDesc, converting Markdown to HTML when needed, and then reusing the existing description/CWE/reference extraction flow.

The importer change is:

                rule_details = self.get_rule_details(rule)
                if rule_details:
                    description = self.clean_rule_description_html(
                        rule_details,
                    )
                    cwe = self.clean_cwe(rule_details)
                    references = sonarqube_permalink + self.get_references(rule_details)
                else:
                    description = ""
                    cwe = None
                    references = sonarqube_permalink

The new helper logic and the small hardening around description/reference parsing are:

    @staticmethod
    def clean_rule_description_html(raw_html):
        if not raw_html:
            return ""
        search = re.search(
            r"^(.*?)(?:(<h2>See</h2>)|(<h2>References</h2>)|(<b>References</b>))",
            raw_html,
            re.DOTALL,
        )
        if search:
            raw_html = search.group(1)
        h = html2text.HTML2Text()
        raw_html = raw_html.replace("<h2>", "<b>").replace("</h2>", "</b>")
        return h.handle(raw_html)

    @staticmethod
    def clean_cwe(raw_html):
        search = re.search(r"CWE-(\d+)", raw_html)
        if search:
            return int(search.group(1))
        return None

    @staticmethod
    def get_rule_details(rule):
        if html_desc := rule.get("htmlDesc"):
            return html_desc
        if not (md_desc := rule.get("mdDesc")):
            return ""
        if SonarQubeApiImporter.is_html_description(md_desc):
            return md_desc
        # SonarQube 2025.x can return markdown-only rule descriptions.
        return markdown.markdown(md_desc, extensions=["extra"])

    @staticmethod
    def is_html_description(description):
        return bool(re.search(r"<[a-zA-Z][^>]*>", description))

    @staticmethod
    def get_references(vuln_details):
        if not vuln_details:
            return ""
        parser = etree.HTMLParser()
        details = etree.fromstring(vuln_details, parser)

        rule_references = ""
        if details is not None:
            for a in details.iter("a"):
                rule_references += f"[{a.text}]({a.get('href')})\n"
        return rule_references

I also added a regression fixture that represents a rule with only mdDesc:

{
    "key": "typescript:S1854",
    "repo": "typescript",
    "name": "Dead stores should be removed",
    "createdAt": "2018-01-17T10:11:21-0500",
    "mdDesc": "A dead store happens when a local variable is assigned a value that is not read by any subsequent instruction. Calculating or retrieving a value only to then overwrite it or throw it away, could indicate a serious error in the code. Even if it's not an error, it is at best a waste of resources. Therefore all calculated values should be used.\n\n## Noncompliant Code Example\n\ni = a + b; // Noncompliant; calculation result not used before value is overwritten\ni = compute();\n\n## Compliant Solution\n\ni = a + b;\ni += compute();\n\n## Exceptions\n\nThis rule ignores initializations to -1, 0, 1, `null`, `true`, `false`, `\"\"`, `[]` and `{}`.\n\n## See\n\n- [MITRE, CWE-563](http://cwe.mitre.org/data/definitions/563.html) - Assignment to Variable without Use ('Unused Variable')\n- [CERT, MSC13-C.](https://www.securecoding.cert.org/confluence/x/QYA5) - Detect and remove unused values\n- [CERT, MSC56-J.](https://www.securecoding.cert.org/confluence/x/uQCSBg) - Detect and remove superfluous code and values\n",
    "severity": "MAJOR",
    "status": "READY",
    "isTemplate": false,
    "tags": [],
    "sysTags": [
        "cert",
        "cwe",
        "unused"
    ],
    "lang": "ts",
    "langName": "TypeScript",
    "params": [],
    "defaultDebtRemFnType": "CONSTANT_ISSUE",
    "defaultDebtRemFnOffset": "15min",
    "debtOverloaded": false,
    "debtRemFnType": "CONSTANT_ISSUE",
    "debtRemFnOffset": "15min",
    "defaultRemFnType": "CONSTANT_ISSUE",
    "defaultRemFnBaseEffort": "15min",
    "remFnType": "CONSTANT_ISSUE",
    "remFnBaseEffort": "15min",
    "remFnOverloaded": false,
    "scope": "MAIN",
    "isExternal": false,
    "type": "CODE_SMELL"
}

The regression test I added is:

def dummy_rule_md_desc_only(self, *args, **kwargs):
    with (get_unit_tests_scans_path("api_sonarqube") / "rule_md_desc_only.json").open(encoding="utf-8") as json_file:
        return json.load(json_file)


class TestSonarqubeImporterMarkdownRuleDescription(DojoTestCase):
    fixtures = [
        "unit_sonarqube_toolType.json",
        "unit_sonarqube_toolConfig1.json",
        "unit_sonarqube_toolConfig2.json",
        "unit_sonarqube_product.json",
        "unit_sonarqube_sqcNoKey.json",
        "unit_sonarqube_sqcWithKey.json",
    ]

    def setUp(self):
        product = Product.objects.get(name="product")
        engagement = Engagement(product=product)
        self.test = Test(
            engagement=engagement,
            api_scan_configuration=Product_API_Scan_Configuration.objects.all().last(),
        )

    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_project", dummy_product)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_rule", dummy_rule_md_desc_only)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.find_issues", dummy_issues)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.get_hotspot_rule", dummy_hotspot_rule)
    @mock.patch("dojo.tools.api_sonarqube.api_client.SonarQubeAPI.find_hotspots", empty_list)
    def test_parser(self):
        parser = SonarQubeApiImporter()
        findings = parser.get_findings(None, self.test)
        self.assertEqual(2, len(findings))
        finding = findings[0]
        self.assertEqual(563, finding.cwe)
        self.assertIn(
            "A dead store happens when a local variable is assigned a value",
            finding.description,
        )
        self.assertIn(
            "[MITRE, CWE-563](http://cwe.mitre.org/data/definitions/563.html)",
            finding.references,
        )

Test results
I tested this change with the SonarQube importer unit tests in the project’s Docker/Postgres-backed test setup.

Focused regression test:

docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml run --rm --no-deps --entrypoint bash uwsgi -lc "export HOME=/tmp PYTHONUSERBASE=/tmp/pyuser PATH=/tmp/pyuser/bin:/usr/local/bin:/usr/bin:/bin; python3 -m pip install --user -r requirements-dev.txt && python3 manage.py test unittests.tools.test_api_sonarqube_importer.TestSonarqubeImporterMarkdownRuleDescription -v2 --keepdb"

Full SonarQube importer test file:

docker compose -f docker-compose.yml -f docker-compose.override.unit_tests.yml run --rm --no-deps --entrypoint bash uwsgi -lc "export HOME=/tmp PYTHONUSERBASE=/tmp/pyuser PATH=/tmp/pyuser/bin:/usr/local/bin:/usr/bin:/bin; python3 -m pip install --user -r requirements-dev.txt >/tmp/requirements-dev-install.log && python3 manage.py test unittests.tools.test_api_sonarqube_importer -v1 --keepdb"

Results:

  • The focused markdown regression test passed.
  • The full unittests.tools.test_api_sonarqube_importer suite passed with 17 tests.

Checklist

This checklist is for your information.

  • Make sure to rebase your PR against the very latest dev.
  • Features/Changes should be submitted against the dev.
  • Bugfixes should be submitted against the bugfix branch.
  • Give a meaningful name to your PR, as it may end up being used in the release notes.
  • Your code is Ruff compliant (see ruff.toml).
  • Your code is python 3.13 compliant.
  • If this is a new feature and not a bug fix, you've included the proper documentation in the docs at https://github.com/DefectDojo/django-DefectDojo/tree/dev/docs as part of this PR.
  • Model changes must include the necessary migrations in the dojo/db_migrations folder.
  • Add applicable tests to the unit tests.
  • Add the proper label to categorize your PR.

@dryrunsecurity
Copy link
Copy Markdown

DryRun Security

This pull request introduces code that generates and returns HTML from markdown (via markdown.markdown(..., extensions=["extra"])) without sanitization, and returns htmlDesc verbatim from get_rule_details, creating a high-severity potential Cross-Site Scripting (XSS) risk if user-controlled content reaches a rendering sink. The finding recommends sanitizing or escaping rendered HTML before use to prevent XSS.

🟠 Potential Cross-Site Scripting in dojo/tools/api_sonarqube/importer.py (drs_21be8a9f)
Vulnerability Potential Cross-Site Scripting
Description The code calls markdown.markdown(md_desc, extensions=["extra"]) and also returns htmlDesc verbatim from get_rule_details; these can produce/contain raw HTML that is later parsed and used without any sanitization or escaping. If user-controlled markdown/html reaches a rendering sink, this can enable XSS.

return markdown.markdown(md_desc, extensions=["extra"])
@staticmethod
def is_html_description(description):


Comment to provide feedback on these findings.

Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]

Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing

All finding details can be found in the DryRun Security Dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant