Vision¶

This page documents the implementation of the visual input received by the simulated fly. Note that in the typical use case, the user should not have to access most of the functions described here. Instead, the visual inputs are given as a part of the observation returned by NeuroMechFly at each time step. Nonetheless, the full API reference is provided here for greater transparency.

Note

For API references of NeuroMechFly simulation with the connectome-constrained model proposed in Lappalainen et al., 2024,, see the Advanced Vision page.

Retina simulation¶

Bases: object

This class handles the simulation of the fly’s visual input. Calculation in this class is vectorized and parallelized using Numba.

Parameters:

ommatidia_id_mapnp.ndarray: Integer NumPy array of shape (nrows, ncols) where the value indicates the ID of the ommatidium (starting from 1). 0 indicates background (outside the hex lattice). By default, the map indicated in the configuration file is loaded.
pale_type_masknp.ndarray: Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates whether the ommatidium is pale-type (1) or yellow-type (0). By default, the mask indicated in the configuration file is used.
distortion_coefficientfloat: A coefficient determining the extent of fisheye effect applied to the raw MuJoCo camera images. By default, the value indicated in the configuration file is used.
zoomfloat: A coefficient determining the zoom level when the fisheye effect is applied. By default, the value indicated in the configuration file is used.
nrowsint: The number of rows in the raw image rendered by the MuJoCo camera. By default, the value indicated in the configuration file is used.
ncolsint: The number of columns in the raw image rendered by the MuJoCo camera. By default, the value used in the configuration file is used.

Attributes:

ommatidia_id_mapnp.ndarray: Integer NumPy array of shape (nrows, ncols) where the value indicates the ID of the ommatidium (starting from 1). 0 indicates background (outside the hex lattice).
num_pixels_per_ommatidianp.ndarray: Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates the number of raw pixels covered within each ommatidium.
pale_type_masknp.ndarray: Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates whether the ommatidium is pale-type (1) or yellow-type (0).
distortion_coefficientfloat: A coefficient determining the extent of fisheye effect applied to the raw MuJoCo camera images.
zoomfloat: A coefficient determining the zoom level when the fisheye effect is applied.
nrowsint: The number of rows in the raw image rendered by the MuJoCo camera.
ncolsint: The number of columns in the raw image rendered by the MuJoCo camera.

correct_fisheye(img: ndarray) → ndarray¶

The raw imaged rendered by the MuJoCo camera is rectilinear. This distorts the image and overrepresents the periphery of the field of view (the same angle near the periphery is reflected by a greater angle in the rendered image). This method applies a fisheye effect to make the same angle represented roughly equally anywhere within the field of view.

Parameters:

img: np.ndarray: The raw MuJoCo camera rendering as a NumPy array of shape (nrows, ncols, 3).

Returns:

np.ndarray: The corrected camera rendering as a NumPy array of shape (nrows, ncols, 3).

Notes

This implementation is based on https://github.com/Gil-Mor/iFish, MIT License.

hex_pxls_to_human_readable(ommatidia_reading: ndarray, color_8bit=False) → ndarray¶

Given the intensity readings for all ommatidia in one eye, convert them to an (nrows, ncols) image with hexagonal blocks that can be visualized as a human-readable image.

Parameters:

ommatidia_readingnp.ndarray: Our simulation of what the fly might see through its compound eyes. It is a (N,) or (N, …) array where the first dimension is for the number of ommatidia.
color_8bitbool: If True, the returned image will be in 8-bit color. This speeds up rendering. Otherwise, the image will be in the same data type as the input ommatidia_reading.

Returns:

np.ndarray: An (nrows, ncols, …) image with hexagonal blocks that can be visualized as a human-readable image. The shape after the 0th dimension matches that of the input ommatidia_reading.

raw_image_to_hex_pxls(raw_img: ndarray) → ndarray¶

Given a raw image from an eye (one camera), simulate what the fly would see.

Parameters:

raw_imgnp.ndarray: RGB image with the shape (H, W, 3) returned by the camera.

Returns:

np.ndarray: Our simulation of what the fly might see through its compound eyes. It is a (N, 2) array where the first dimension is for the N ommatidia, and the third dimension is for the two channels.

Note that sometimes it is helpful to hide certain objects in the arena when rendering the fly’s vision. For example, markers for odor sources that are meant for user visualization only should not be seen by the fly. To accomplish this, we have provided two hook methods in BaseArena that allow the user to modify the arena as needed before and after we simulate the fly’s vision (for example, changing the alpha value of the odor source markers here):

BaseArena.pre_visual_render_hook(physics: dm_control.mjcf.Physics, *args, **kwargs) → None: Make necessary changes (e.g. make certain visualization markers transparent) before rendering the visual inputs. By default, this does nothing.

BaseArena.post_visual_render_hook(physics: dm_control.mjcf.Physics, *args, **kwargs) → None: Make necessary changes (e.g. make certain visualization markers opaque) after rendering the visual inputs. By default, this does nothing.

Visualization tool¶

We have also provided a utility function to generate a video of the visual input during a simulation:

flygym.vision.visualize_visual_input(retina: Retina, output_path: Path, vision_data_li: list[ndarray], raw_vision_data_li: list[ndarray], vision_update_mask: ndarray, vision_refresh_rate: float = 500, playback_speed: float = 0.1)¶

Convert lists of vision readings into a video and save it to disk.

Parameters:

retinaRetina: The retina object used to generate the visual input.
output_pathPath: Path of the output video will be saved. Should end with “.mp4”.
vision_data_lilist[np.ndarray]: List of ommatidia readings. Each element is an array of shape (2, N, 2) where the first dimension is for the left and right eyes, the second dimension is for the N ommatidia, and the third dimension is for the two channels. The length of this list is the number of simulation steps.
raw_vision_data_lilist[np.ndarray]: Same as vision_data_li but with the raw RGB images from the cameras instead of the simulated ommatidia readings. The shape of each element is therefore (2, H, W, 3) where the first dimension is for the left and right eyes, and the remaining dimensions are for the RGB image.
vision_update_masknp.ndarray: Mask indicating which simulation steps have vision updates. This should be taken from NeuroMechFly.vision_update_mask.
vision_refresh_ratefloat, optional: The refresh rate of visual inputs in Hz. This should be consistent with MuJoCoParameters.vision_refresh_rate that is given to the simulation. By default 500.
playback_speedfloat, optional: Speed, as a multiple of the 1x speed, at which the video should be rendered, by default 0.1.

flygym.vision.add_insets(retina, viz_frame, visual_input, panel_height=150)¶

Add insets to the visualization frame.

Parameters:

retinaRetina: The retina object used to generate the visual input.
viz_framenp.ndarray: The visualization frame to add insets to.
visual_inputnp.ndarray: The visual input to the retina. Should be of shape (2, N, 2) as returned in the observation of the environment (obs["vision"]).
panel_heightint, optional: Height of the panel that contains the insets, by default 150.

Returns:

np.ndarray: The visualization frame with insets added.

flygym.vision.save_video_with_vision_insets(sim, cam, path, visual_input_hist, stabilization_time=0.02)¶

Save a list of frames as a video with insets showing the visual experience of the fly. This is almost a drop-in replacement of NeuroMechFly.save_video but as a static function (instead of a class method) and with an extra argument visual_input_hist.

Parameters:

simSimulation: The Simulation object.
camCamera: The Camera object that has been used to generate the frames.
pathPath: Path of the output video will be saved. Should end with “.mp4”.
visual_input_histlist[np.ndarray]: List of ommatidia readings. Each element is an array of shape (2, N, 2) where N is the number of ommatidia per eye.
stabilization_timefloat, optional: Time (in seconds) to wait before starting to render the video. This might be wanted because it takes a few frames for the position controller to move the joints to the specified angles from the default, all-stretched position. By default 0.02s