Researchers build a bridge from C to Rust and memory safety

Memory errors such as out-of-bounds reads and writes and use-after-free bugs have plagued applications for decades, causing problems ranging from minor execution glitches to global security nightmares. The infamous WannaCry, Slammer, and Heartbleed exploits, and the more recent LDAP Nightmare, for example, were all enabled by buffer overflows.

These memory safety vulnerabilities stem from the use of older programming languages such as C and C++, which lack automated memory management capabilities and rely on programmers to take care of nitty-gritty details like bounds checking themselves. By contrast, memory-safe languages, like Rust, automate the allocation and deallocation of memory and prevent memory-access bugs like out-of-bounds errors.

“Approximately 70% of software security issues are related to incorrect memory handling,” observed Anne Thomas, VP & distinguished analyst at Gartner. “That’s why the US Office of the National Cyber Director (ONCD) issued a report in February 2024 calling for the adoption of memory-safe programming languages. Unfortunately, a huge number of applications, low-level utilities, and operating systems are implemented in memory-unsafe languages like C and C++.”

The ONCD report recommended that C and C++ be replaced by memory-safe languages such as Rust, Python, C#, Go, and Java. But that may be easier said than done.

“I know about C and how very thorny it is to get off it,” said Jason Andersen, VP and principal analyst, Moor Insights & Strategy. “One of the biggest challenges is that C is very useful at the system level, making it remarkably sticky and performant. So, technically, it is hard to replicate with anything else.”

A bridge to memory safety

Now a pair of computer scientists has developed a subset of C that can be translated into memory-safe Rust automatically. Dubbed Mini-C, it was created by Aymeric Fromherz of Inria, the French national institute for research in digital science and technology, and Jonathan Protzenko of Microsoft Azure Research. They present their research in the paper, Compiling C to Safe Rust, Formalized.

To produce safe Rust, Fromherz and Protzenko developed what they describe as a “data-oriented, applicative subset of C,” which may require users to make what they called “minimal adjustments” to the source C program to accommodate the subset. “Once in this subset,” they said, “our approach then automatically produces valid, safe Rust code.”

The pair evaluated their approach on two projects: the HACL* verified cryptographic library, which contains 80K lines of C code, and the EverParse CBOR parser’s 1.4K lines of C. They found that HACL* source required only “minimal” adjustments and EverParse needed no changes to become Mini-C and then translate to Rust.

The researchers noted, “The application of our approach to HACL* results in a 80,000 line verified cryptographic library, written in pure Rust, that implements all modern algorithms without a single use of unsafe — the first of its kind.” The Rust keyword unsafe allows Rust programmers to abandon Rust’s memory safety guarantees and include blocks of Rust code that perform unsafe operations.

“I think [this technique] could have a lot of value for developers coding in C since it can jumpstart the transition to Rust and also aid in helping these programmers learn Rust, which can be a challenging transition,” said Jim Mercer, program VP, software development, DevOps & DevSecOps at IDC.

“It is not quite a Cobol situation,” Mercer added, “but many of the key C programmers are nearing retirement, so the maintenance of these applications and systems could become a more prominent issue down the road. Also, we are so focused on memory safety that we neglect to highlight other benefits of Rust. It offers modern language features like concurrency primitives, pattern matching, and a powerful type system, which can lead to more concise, expressive, and maintainable code.”

Moor’s Andersen pointed to additional benefits. “One thing that is promising would be using generative AI to help accelerate Mini-C and the migration process further,” he said. “We are starting to see cloud providers like AWS and IBM making tools to help migrate from .NET to Java or even Cobol to Java. Maybe a tool like Q Developer could do Rust to C someday.”

Caveats and concerns

While there are many benefits to Fromherz and Protzenko’s approach, the three analysts I spoke with have concerns.

“This joint Microsoft-Inria project has demonstrated success doing this type of transpiling, but it has a significant limitation — it only works on a subset of the C language — what they call Mini-C,” Gartner’s Thomas noted. “The problem is that most C and C++ applications use aspects of C/C++ that don’t fit into the Mini-C subset.”

“If you combine this tool with another tool that could rewrite a C application into the Mini-C subset, that would be huge,” Thomas added. “You could probably use a tool like OpenRewrite to do that type of refactoring/conversion.”

IDC’s Mercer agreed that Mini-C is only a start. “Rewriting or even having AI rewrite C code into Rust can require the developer’s time and considerable testing to ensure the system is still working and performant. There are also challenges with how C uses pointer arithmetic,” he said. “I think this should work well for more straightforward C/C++ code, but when you get into more complex code using things such as pointer arithmetic, these constructs can be hard to duplicate in Rust. However, just because it is hard does not mean it is not solvable or worth doing — it is certainly better than trying to rewrite all the code manually.”

However, Moor’s Andersen observed, “Any solution like this would be helpful since it potentially resolves some of the migration issues, but that unfortunately comes with many caveats.” He cited the risk that older code bases may not be as amenable to the technique as HACL*, so “customer mileage will vary significantly.” 

“As application mileage varies, I’d hope to see a community grow around this effort so best practices can be shared and libraries get updated as those learnings happen,” Andersen said. “We need to get something like this into an open-source community and get some committers doing some work.”

But communities need people, and finding skilled developers with the required level of C expertise is difficult. Funding of the project is yet another issue.

“While the tech may be promising, you need a lot more than that to get momentum, no matter what the White House says,” Andersen said.

Source

Yorum yapın