Question is how to do these in Rust. An example might be a browser DOM: each node has a parent pointer, a list of child pointers, left and right sibling pointers, maybe a CSS node pointer, etc. Inserting or deleting nodes has to repair the pointers to and from the neighboring nodes as needed.
I know this is doable since obviously Servo (Rust’s initial driving application) has to do it. I hope the answer doesn’t involve the word “unsafe”. But I am quite unclear about how to create such a structure under Rust’s rules of pointer ownership, and also how to reliably reclaim storage when nodes and trees/subtrees are deleted. Plus there will be thread safety rules that should be statically enforced if possible.
I’ve heard that doubly linked lists in Rust are handled by an unsafe library, yet this seems even worse. Thanks.
The safe, fast and easy way to do trees is by using Rc<RefCell<T>>. Rc/Arc allows data to be owned multiple times. You want this because this way a node can be referenced by its parent and its child at the same time. However, Rc makes the inner type inmutable. And you probably will want to mutate it in a tree, that’s what RefCell is for. With RefCell you do the borrow checking at run-time instead of at compile-time. This allows you to mutate T even though Rc only gives you an inmutable reference. This is called interior mutability.
RefCell doesn’t eliminate the borrow checker though, you must still follow its rules. If you try to get 2 mutable references to the inner type of RefCell, it will panic.
I know you don’t want to read unsafe, but you gotta hear about the alternative. Just use pointers. Pointers don’t have the borrow checker to restrict them. And self-referencing structures with interior mutability are not easy to borrow-check automatically. You can have the raw pointers as private fields of the struct so the code that is actually unsafe will be a few very small functions.
Here’s why the other options worse than pointers:
Rc<RefCell<T>> will clutter your code with boilerplate and it’s a pain to deal with. Pointers are not too ergonomic in rust (mainly because there is no
->
operator), but they need way less boilerplate. Also, you already need to manually check the mutability rules, why not all the rules.Another option that I’ve seen is “have a hashmap with all the nodes, and just store the id of the node instead of a reference”. This is the same as “have all the nodes on a Vector and store the index”. Think about this for a second. You have a pool of memory and a number that identifies what part of that pool is the memory you want. Seen it yet? That is exactly what a pointer is! If you do that, you’re just disabling the borrow-checker anyway. You just created your own memory allocator and will have to manage your memory manually, at that point just use pointers, it will be the same except with fewer boilerplate and indirection.
Thanks, that is interesting about using a refcell. It hadn’t occurred to me that a refcell was considered immutable for purposes of Rc. What about the issue of refcounting cylic structures though? I thought the route around that was weakrefs per someone else’s suggestion.
By pointers, do you mean unsafe pointers that aren’t borrow checked? I guess that’s not really worse than writing a little bit of the program in C so it may be the right approach pragmatically. So I’m again wondering about Servo. I guess eventually Rust will get verification tools that let you check the soundness of “unsafe” code too. So the discomfort might only be temporary.
Sure, hashmap or numeric indices are abstractly equivalent to pointers, but Rust is supposed to be a low level language that can deal with machine primitives, and pointers are machine primitives. Simulating them through extra levels of tables is unsatisfying. Also, the uses of those pseudo-pointers are no longer checked by the compler, so your program is vulnerable to many (not all) of the same old-fashioned pointer errors that Rust was supposed to rescue us from.
RefCell is neither mutable nor immutable. It’s a type like any other. What is special about RefCell is that it has a method like:
fn borrow_mut(&self) -> &mut T
Which means you can get a mutable reference to its contents by only having a normal reference to the RefCell.
By pointers I mean raw pointers. The pointers themselves are not unsafe. They are just normal pointers like you would have in C.
Rc can be used to generate weak refs. Which is what you want for your tree.
I don’t know about servo. So I can’t tell you much about it.
Don’t hope too much about the temporary unsafe thing. It’s not practical (maybe impossible) to make a safety checker that checks the entire program. The practical way is to make safe abstractions using unsafe code. For example rust itself is built upon plenty of unsafe code, however, it provides to you the appropriate abstractions so what you do is safe.
In this case. You can have a bit of unsafe code on your tree, so that the users of that tree have a safe API to do something that you needed unsafe code for.
For example one of the cases where you cannot automatically check the safety is in dereferencing a raw pointer returned by a FFI function call to a dynamic library. Your automatic safety checker would need to be able to read the library’s documentation. And at that point it’s not a real safety checker because the documentation may lie or have bugs.