Trojan Source attack invisibly threatens code security • The Register

Written by Syndicated News Feed



11/01/2021

Updated The way Unicode’s UTF-8 text encoding handles different languages could be misused to write malicious code that says one thing to humans and another to compilers, academics are warning.

“What if it were possible to trick compilers into emitting binaries that did not match the logic visible in source code?” ask Cambridge student Nicholas Boucher and Professor Ross Anderson in a paper published today.

They say it is possible, and outlined a new threat [PDF] that could be deployed by future supply chain attackers – making detection of something like the SolarWinds attack at code level even harder than it is already.

Tracked as CVE-2021-42574, the duo’s research focused on so-called bidirectional (“bidi”) characters in Unicode. These are used so words written in right-to-left languages (such as Arabic and Hebrew) can be inserted into sentences written in left-to-right languages (such as English). Boucher and Anderson discovered that they can be misused to misrepresent source code.

“Embedding multiple layers of LRI and RLI within each other enables the near-arbitrary reordering of strings,” says their paper. “Our key insight is that we can reorder source code characters in such a way that the resulting display order also represents syntactically valid source code.”

“In effect, we anagram program A into program B.”

Concerningly, the academics say that Microsoft’s VS Code and Apple’s Xcode text editors don’t highlight the use of bidi characters as prominently as they might – while praising Vim for showing them as “numerical code points.”

Professor Anderson told The Register: “Most programming languages let you put [bidi characters] in string literals and in comments, so you can use them in source code: code that appears innocuous to a human reviewer can actually do something nasty. That’s bad news for projects like Linux and Webkit that accept contributions from random people, subject them to manual review, then incorporate them into critical code.”

The problem is not merely academic: Rust’s maintainers patched rustc against the attack over the weekend after the researchers used it for a successful proof-of-concept, even though Rust acknowledged it has not seen the technique deployed in the wild.

Snippets of the technique exist on GitHub, although the Cambridge pair’s paper says that none of them seemed to be malicious.

Break comment, receive code

Boucher and Anderson’s paper included several examples of this novel attack technique. One, in Python, is presented below.

Code snippet demonstrating the bidirectional character Trojan Source attack

Click to enlarge

In figure 2 'alice' is defined as being worth 100, followed by a function that subtracts funds from Alice. The final line calls that function with a value of 50, so when executed that little program should give us a result of 50.

However, figure 1 shows us how bidi characters can be used to frustrate the program’s intent: by inserting RLI we change the text direction from conventional English to right-to-left. The output of figure 1 becomes 100 in spite of our subtract funds function.

“This is because the word return in the docstring is actually executed due to a bidi override, causing the function to return prematurely and the code which subtracts value from a user’s bank account to never run,” explains the paper.

The same principle can be applied to other languages, including C, C#, C++ and JavaScript as well as Rust – though for the latter, yesterday’s update to version 1.56.0 sees Rust rejecting code containing bidi characters.

Surely highlighting solves this

Most text editors used by devs highlight various levels of nested code, so you’d imagine bidi attacks would be frustrated by changes immediately showing up. Unfortunately, this isn’t as reliable a defence as you might imagine: the academics say their “experience was mixed” on this front.

“Some attacks provided strange highlighting in a subset of editors, which may suffice to alert developers that an encoding issue is present. However, all syntax highlighting nuances were editor-specific, and other attacks did not show abnormal highlighting in the same settings” the paper says.

A pile of blocks with characters on their sides

Hey, AI software developers, you are taking Unicode into account, right … right?

Updated to add

Interestingly, Atlassian issued a security advisory for CVE-2021-42574 affecting a collection of its products, from Confluence to Jira, with multiple software updates to address the issue.

“A vulnerability has been identified affecting multiple Atlassian products where special characters, known as Unicode bidirectional override characters, are not rendered or displayed in the affected applications,” the IT giant said.

“These special characters are typically not displayed by the browser or code editors but can affect the meaning of the source code when it is processed by a compiler or an interpreter.”

Bootnote

Boucher and Anderson’s paper observes: “When writing vulnerability disclosures, descriptions that personalise the potential impact can be needed to drive action. Neutral disclosures like those found in academic papers are less likely to evoke a response than disclosures stating that named products are immediately at risk”.

We reserve the right to arbitrarily rename the next security discovery FLAMINGHELLDEATHPWNAGE. Tenders will be issued in due course for design of a logo and procurement of a snappy domain name.

← Prev: Android hit by AbstractEmu malware • The Register Next: Your data is wider and deeper than ever – and so are the threats • The Register →

Trojan Source attack invisibly threatens code security • The Register

Written by Syndicated News Feed

IT Security

0 Comments(s)

11/01/2021

Break comment, receive code

Surely highlighting solves this

Hey, AI software developers, you are taking Unicode into account, right … right?

Updated to add

Bootnote

You May Also Like…

Microsoft open-sources VS Code Copilot Chat extension on GitHub

How cybercrooks get themselves caught • The Register

U.S. warns of Iranian cyber threats on critical infrastructure

0 Comments