Mercurial > public > mercurial-scm > hg
changeset 52411:c2480ac4c5e2
rust-pyo3: retrieving the InnerRevlog of hg-cpython
This allows PyO3-based code to use the InnerRevlog, access its shared data
(core InnerRevlog), which will then allow, e.g., to retrieve references on
the core Index.
On the `hg-cpython` (`rusthg` crate, `rustext` Python extension module),
we had to also build as a Rust library, and open up some accesses (see
notably the public accessor for `inner`, the core `InnerRevlog`).
Retrieving the Rust struct underlying a Python object defined by another
extension module written in Rust is tricky because the Python type objects
are duplicated in the extension modules, leading to failure of the normal
type checking. See the doc-comment of `convert_cpython::extract_inner_revlog`
for a complete explanation.
To solve this, we import the Python type object of `rustext` (defined
by `hg-cpython`) and perform a manual check. Checking the Python type is
necessary, as PyO3 documentation clearly state that downcasting an object
that has not the proper type is Undefined Behaviour.
At this point, we do not have conversion facilities for exceptions (`PyErr`
on both sides), hence the remaining unwraps).
author | Georges Racinet <georges.racinet@cloudcrane.io> |
---|---|
date | Sat, 30 Nov 2024 20:57:02 +0100 |
parents | 15011324a80b |
children | 6e8ba52857df |
files | rust/Cargo.lock rust/hg-cpython/src/revlog.rs rust/hg-pyo3/Cargo.toml rust/hg-pyo3/src/convert_cpython.rs rust/hg-pyo3/src/lib.rs |
diffstat | 5 files changed, 193 insertions(+), 3 deletions(-) [+] |
line wrap: on
line diff
--- a/rust/Cargo.lock Thu Dec 05 18:58:56 2024 +0100 +++ b/rust/Cargo.lock Sat Nov 30 20:57:02 2024 +0100 @@ -712,11 +712,15 @@ name = "hg-pyo3" version = "0.1.0" dependencies = [ + "cpython", "derive_more", "env_logger 0.9.3", "hg-core", + "hg-cpython", + "lazy_static", "log", "pyo3", + "python3-sys", "stable_deref_trait", ]
--- a/rust/hg-cpython/src/revlog.rs Thu Dec 05 18:58:56 2024 +0100 +++ b/rust/hg-cpython/src/revlog.rs Sat Nov 30 20:57:02 2024 +0100 @@ -16,7 +16,7 @@ buffer::{Element, PyBuffer}, exc::{IndexError, ValueError}, ObjectProtocol, PyBool, PyBytes, PyClone, PyDict, PyErr, PyInt, PyList, - PyModule, PyObject, PyResult, PySet, PyTuple, PyType, Python, + PyModule, PyObject, PyResult, PySet, PySharedRef, PyTuple, PyType, Python, PythonObject, ToPyObject, UnsafePyLeaked, }; use hg::{ @@ -51,7 +51,7 @@ pub struct PySharedIndex { /// The underlying hg-core index - pub(crate) inner: &'static Index, + pub inner: &'static Index, } /// Return a Struct implementing the Graph trait @@ -1488,6 +1488,13 @@ /// Forwarded index methods? impl InnerRevlog { + pub fn pub_inner<'p, 'a: 'p>( + &'a self, + py: Python<'p>, + ) -> PySharedRef<'p, CoreInnerRevlog> { + self.inner(py) + } + fn len(&self, py: Python) -> PyResult<usize> { let rust_index_len = self.inner(py).borrow().index.len(); Ok(rust_index_len)
--- a/rust/hg-pyo3/Cargo.toml Thu Dec 05 18:58:56 2024 +0100 +++ b/rust/hg-pyo3/Cargo.toml Sat Nov 30 20:57:02 2024 +0100 @@ -9,8 +9,13 @@ [dependencies] pyo3 = { version = "0.23.1", features = ["extension-module"] } +cpython = { version = "0.7.2", features = ["extension-module"] } +hg-cpython = { path = "../hg-cpython" } +python3-sys = { version = "0.7.2" } hg-core = { path = "../hg-core"} stable_deref_trait = "1.2.0" log = "0.4.17" +derive_more = "0.99.17" env_logger = "0.9.3" -derive_more = "0.99.17" +lazy_static = "*" +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/rust/hg-pyo3/src/convert_cpython.rs Sat Nov 30 20:57:02 2024 +0100 @@ -0,0 +1,173 @@ +//! This module takes care of all conversions involving `rusthg` (hg-cpython) +//! objects in the PyO3 call context. +//! +//! For source code clarity, we only import (`use`) [`cpython`] traits and not +//! any of its data objects. We are instead using full qualifiers, such as +//! `cpython::PyObject`, and believe that the added heaviness is an acceptatble +//! price to pay to avoid confusion. +//! +//! Also it, is customary in [`cpython`] to label the GIL lifetime as `'p`, +//! whereas it is `'py` in PyO3 context. We keep both these conventions in +//! the arguments side of function signatures when they are not simply elided. +use pyo3::exceptions::PyTypeError; +use pyo3::prelude::*; + +use cpython::ObjectProtocol; +use cpython::PythonObject; +use lazy_static::lazy_static; + +use rusthg::revlog::{InnerRevlog, PySharedIndex}; + +/// Force cpython's GIL handle with the appropriate lifetime +/// +/// In `pyo3`, the fact that we have the GIL is expressed by the lifetime of +/// the incoming [`Bound`] smart pointer. We therefore simply instantiate +/// the `cpython` handle and coerce its lifetime by the function signature. +/// +/// Reacquiring the GIL is also a possible alternative, as the CPython +/// documentation explicitely states that "recursive calls are allowed" +/// (we interpret that as saying that acquiring the GIL within a thread that +/// already has it works) *as long as it is properly released* +/// reference: +/// <https://docs.python.org/3.8/c-api/init.html#c.PyGILState_Ensure> +pub(crate) fn cpython_handle<'py, T>( + _bound: &Bound<'py, T>, +) -> cpython::Python<'py> { + // safety: this is safe because the returned object has the 'py lifetime + unsafe { cpython::Python::assume_gil_acquired() } +} + +/// Force PyO3 GIL handle from cpython's. +/// +/// Very similar to [`cpython_handle`] +pub fn pyo3_handle(_py: cpython::Python<'_>) -> Python<'_> { + // safety: this is safe because the returned object has the same lifetime + // as the incoming object. + unsafe { Python::assume_gil_acquired() } +} + +/// Convert a PyO3 [`PyObject`] into a [`cpython::PyObject`] +/// +/// During this process, the reference count is increased, then decreased. +/// This means that the GIL (symbolized by the lifetime on the `obj` +/// argument) is needed. +/// +/// We could make something perhaps more handy by simply stealing the +/// pointer, forgetting the incoming and then implement `From` with "newtype". +/// It would be worth the effort for a generic cpython-to-pyo3 crate, perhaps +/// not for the current endeavour. +pub(crate) fn to_cpython_py_object<'py>( + obj: &Bound<'py, PyAny>, +) -> (cpython::Python<'py>, cpython::PyObject) { + let py = cpython_handle(obj); + // public alias of the private cpython::fii::PyObject (!) + let raw = obj.as_ptr() as *mut python3_sys::PyObject; + // both pyo3 and rust-cpython will decrement the refcount on drop. + // If we use from_owned_ptr, that's a segfault. + (py, unsafe { cpython::PyObject::from_borrowed_ptr(py, raw) }) +} + +/// Convert a [`cpython::PyObject`] into a PyO3 [`PyObject`] +/// +/// During this process, the reference count is increased, then decreased. +/// This means that the GIL (symbolized by the PyO3 [`Python`] handle is +/// needed. +/// +/// We could make something perhaps more handy by simply stealing the +/// pointer, forgetting the incoming and then implement `From` with "newtype". +/// It would be worth the effort for a generic cpython-to-pyo3 crate, perhaps +/// not for the current endeavour. +pub(crate) fn from_cpython_py_object( + py: Python<'_>, + obj: cpython::PyObject, +) -> PyObject { + let raw = obj.as_ptr() as *mut pyo3::ffi::PyObject; + unsafe { Py::from_borrowed_ptr(py, raw) } +} + +/// Convert [`cpython::PyErr`] into [`pyo3::PyErr`] +/// +/// The exception class remains the same as the original exception, +/// hence if it is also defined in another dylib based on `cpython` crate, +/// it will need to be converted to be downcasted in this crate. +pub(crate) fn from_cpython_pyerr( + py: cpython::Python<'_>, + mut e: cpython::PyErr, +) -> PyErr { + let pyo3_py = pyo3_handle(py); + let cpython_exc_obj = e.instance(py); + let pyo3_exc_obj = from_cpython_py_object(pyo3_py, cpython_exc_obj); + PyErr::from_value(pyo3_exc_obj.into_bound(pyo3_py)) +} + +/// Retrieve the PyType for objects from the `mercurial.rustext` crate. +fn retrieve_cpython_py_type( + submodule_name: &str, + type_name: &str, +) -> cpython::PyResult<cpython::PyType> { + let guard = cpython::Python::acquire_gil(); + let py = guard.python(); + let module = py.import(&format!("mercurial.rustext.{submodule_name}"))?; + module.get(py, type_name)?.extract::<cpython::PyType>(py) +} + +lazy_static! { + static ref INNER_REVLOG_PY_TYPE: cpython::PyType = { + retrieve_cpython_py_type("revlog", "InnerRevlog") + .expect("Could not import InnerRevlog in Python") + }; +} + +/// Downcast [`InnerRevlog`], with the appropriate Python type checking. +/// +/// The PyType object representing the `InnerRevlog` Python class is not the +/// the same in this dylib as it is in the `mercurial.rustext` module. +/// This is because the code created with the [`cpython::py_class!`] +/// macro is itself duplicated in both dylibs. In the case of this crate, this +/// happens by linking to the [`rusthg`] crate and provides the `InnerRevlog` +/// that is visible from this crate. The `InnerRevlog::get_type` associated +/// function turns out to return a `static mut` (look for `TYPE_OBJECT` in +/// `py_class_impl3.rs`), which obviously is different in both dylibs. +/// +/// The consequence of that is that downcasting an `InnerRevlog` originally +/// from the `mecurial.rustext` module to our `InnerRevlog` cannot be done with +/// the usual `extract::<InnerRevlog>(py)`, as it would perform the type +/// checking with the `PyType` that is embedded in `mercurial.pyo3_rustext`. +/// We must check the `PyType` that is within `mercurial.rustext` instead. +/// This is what this function does. +fn extract_inner_revlog( + py: cpython::Python, + inner_revlog: cpython::PyObject, +) -> PyResult<InnerRevlog> { + if !(*INNER_REVLOG_PY_TYPE).is_instance(py, &inner_revlog) { + return Err(PyTypeError::new_err("Not an InnerRevlog instance")); + } + // Safety: this is safe because we checked the PyType already, with the + // value embedded in `mercurial.rustext`. + Ok(unsafe { InnerRevlog::unchecked_downcast_from(inner_revlog) }) +} + +/// This is similar to [`rusthg.py_rust_index_to_graph`], with difference in +/// how we retrieve the [`InnerRevlog`]. +pub fn py_rust_index_to_graph( + py: cpython::Python, + index_proxy: cpython::PyObject, +) -> PyResult<cpython::UnsafePyLeaked<PySharedIndex>> { + let inner_revlog = extract_inner_revlog( + py, + index_proxy + .getattr(py, "inner") + .map_err(|e| from_cpython_pyerr(py, e))?, + )?; + + let leaked = inner_revlog.pub_inner(py).leak_immutable(); + // Safety: we don't leak the "faked" reference out of the `UnsafePyLeaked` + Ok(unsafe { leaked.map(py, |idx| PySharedIndex { inner: &idx.index }) }) +} + +pub(crate) fn proxy_index_extract<'py>( + index_proxy: &Bound<'py, PyAny>, +) -> PyResult<(cpython::Python<'py>, cpython::UnsafePyLeaked<PySharedIndex>)> { + let (py, idx_proxy) = to_cpython_py_object(index_proxy); + Ok((py, py_rust_index_to_graph(py, idx_proxy)?)) +}