Vision

This page documents the implementation of the visual input received by the simulated fly. Note that in the typical use case, the user should not have to access most of the functions described here. Instead, the visual inputs are given as a part of the observation returned by NeuroMechFly at each time step. Nonetheless, the full API reference is provided here for greater transparency.

Note

For API references of NeuroMechFly simulation with the connectome-constrained model proposed in Lappalainen et al., 2024,, see the Advanced Vision page.

Retina simulation

class flygym.vision.Retina(ommatidia_id_map: ndarray | None = None, pale_type_mask: ndarray | None = None, distortion_coefficient: float | None = None, zoom: float | None = None, nrows: int | None = None, ncols: int | None = None)

Bases: object

This class handles the simulation of the fly’s visual input. Calculation in this class is vectorized and parallelized using Numba.

Parameters:
ommatidia_id_mapnp.ndarray

Integer NumPy array of shape (nrows, ncols) where the value indicates the ID of the ommatidium (starting from 1). 0 indicates background (outside the hex lattice). By default, the map indicated in the configuration file is loaded.

pale_type_masknp.ndarray

Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates whether the ommatidium is pale-type (1) or yellow-type (0). By default, the mask indicated in the configuration file is used.

distortion_coefficientfloat

A coefficient determining the extent of fisheye effect applied to the raw MuJoCo camera images. By default, the value indicated in the configuration file is used.

zoomfloat

A coefficient determining the zoom level when the fisheye effect is applied. By default, the value indicated in the configuration file is used.

nrowsint

The number of rows in the raw image rendered by the MuJoCo camera. By default, the value indicated in the configuration file is used.

ncolsint

The number of columns in the raw image rendered by the MuJoCo camera. By default, the value used in the configuration file is used.

Attributes:
ommatidia_id_mapnp.ndarray

Integer NumPy array of shape (nrows, ncols) where the value indicates the ID of the ommatidium (starting from 1). 0 indicates background (outside the hex lattice).

num_pixels_per_ommatidianp.ndarray

Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates the number of raw pixels covered within each ommatidium.

pale_type_masknp.ndarray

Integer NumPy array of shape (max(ommatidia_id_map),) where the value of each element indicates whether the ommatidium is pale-type (1) or yellow-type (0).

distortion_coefficientfloat

A coefficient determining the extent of fisheye effect applied to the raw MuJoCo camera images.

zoomfloat

A coefficient determining the zoom level when the fisheye effect is applied.

nrowsint

The number of rows in the raw image rendered by the MuJoCo camera.

ncolsint

The number of columns in the raw image rendered by the MuJoCo camera.

correct_fisheye(img: ndarray) ndarray

The raw imaged rendered by the MuJoCo camera is rectilinear. This distorts the image and overrepresents the periphery of the field of view (the same angle near the periphery is reflected by a greater angle in the rendered image). This method applies a fisheye effect to make the same angle represented roughly equally anywhere within the field of view.

Parameters:
img: np.ndarray

The raw MuJoCo camera rendering as a NumPy array of shape (nrows, ncols, 3).

Returns:
np.ndarray

The corrected camera rendering as a NumPy array of shape (nrows, ncols, 3).

Notes

This implementation is based on https://github.com/Gil-Mor/iFish, MIT License.

hex_pxls_to_human_readable(ommatidia_reading: ndarray, color_8bit=False) ndarray

Given the intensity readings for all ommatidia in one eye, convert them to an (nrows, ncols) image with hexagonal blocks that can be visualized as a human-readable image.

Parameters:
ommatidia_readingnp.ndarray

Our simulation of what the fly might see through its compound eyes. It is a (N,) or (N, …) array where the first dimension is for the number of ommatidia.

color_8bitbool

If True, the returned image will be in 8-bit color. This speeds up rendering. Otherwise, the image will be in the same data type as the input ommatidia_reading.

Returns:
np.ndarray

An (nrows, ncols, …) image with hexagonal blocks that can be visualized as a human-readable image. The shape after the 0th dimension matches that of the input ommatidia_reading.

raw_image_to_hex_pxls(raw_img: ndarray) ndarray

Given a raw image from an eye (one camera), simulate what the fly would see.

Parameters:
raw_imgnp.ndarray

RGB image with the shape (H, W, 3) returned by the camera.

Returns:
np.ndarray

Our simulation of what the fly might see through its compound eyes. It is a (N, 2) array where the first dimension is for the N ommatidia, and the third dimension is for the two channels.

Note that sometimes it is helpful to hide certain objects in the arena when rendering the fly’s vision. For example, markers for odor sources that are meant for user visualization only should not be seen by the fly. To accomplish this, we have provided two hook methods in BaseArena that allow the user to modify the arena as needed before and after we simulate the fly’s vision (for example, changing the alpha value of the odor source markers here):

BaseArena.pre_visual_render_hook(physics: dm_control.mjcf.Physics, *args, **kwargs) None

Make necessary changes (e.g. make certain visualization markers transparent) before rendering the visual inputs. By default, this does nothing.

BaseArena.post_visual_render_hook(physics: dm_control.mjcf.Physics, *args, **kwargs) None

Make necessary changes (e.g. make certain visualization markers opaque) after rendering the visual inputs. By default, this does nothing.

Visualization tool

We have also provided a utility function to generate a video of the visual input during a simulation:

flygym.vision.visualize_visual_input(retina: Retina, output_path: Path, vision_data_li: list[ndarray], raw_vision_data_li: list[ndarray], vision_update_mask: ndarray, vision_refresh_rate: float = 500, playback_speed: float = 0.1)

Convert lists of vision readings into a video and save it to disk.

Parameters:
retinaRetina

The retina object used to generate the visual input.

output_pathPath

Path of the output video will be saved. Should end with “.mp4”.

vision_data_lilist[np.ndarray]

List of ommatidia readings. Each element is an array of shape (2, N, 2) where the first dimension is for the left and right eyes, the second dimension is for the N ommatidia, and the third dimension is for the two channels. The length of this list is the number of simulation steps.

raw_vision_data_lilist[np.ndarray]

Same as vision_data_li but with the raw RGB images from the cameras instead of the simulated ommatidia readings. The shape of each element is therefore (2, H, W, 3) where the first dimension is for the left and right eyes, and the remaining dimensions are for the RGB image.

vision_update_masknp.ndarray

Mask indicating which simulation steps have vision updates. This should be taken from NeuroMechFly.vision_update_mask.

vision_refresh_ratefloat, optional

The refresh rate of visual inputs in Hz. This should be consistent with MuJoCoParameters.vision_refresh_rate that is given to the simulation. By default 500.

playback_speedfloat, optional

Speed, as a multiple of the 1x speed, at which the video should be rendered, by default 0.1.

flygym.vision.add_insets(retina, viz_frame, visual_input, panel_height=150)

Add insets to the visualization frame.

Parameters:
retinaRetina

The retina object used to generate the visual input.

viz_framenp.ndarray

The visualization frame to add insets to.

visual_inputnp.ndarray

The visual input to the retina. Should be of shape (2, N, 2) as returned in the observation of the environment (obs["vision"]).

panel_heightint, optional

Height of the panel that contains the insets, by default 150.

Returns:
np.ndarray

The visualization frame with insets added.

flygym.vision.save_video_with_vision_insets(sim, cam, path, visual_input_hist, stabilization_time=0.02)

Save a list of frames as a video with insets showing the visual experience of the fly. This is almost a drop-in replacement of NeuroMechFly.save_video but as a static function (instead of a class method) and with an extra argument visual_input_hist.

Parameters:
simSimulation

The Simulation object.

camCamera

The Camera object that has been used to generate the frames.

pathPath

Path of the output video will be saved. Should end with “.mp4”.

visual_input_histlist[np.ndarray]

List of ommatidia readings. Each element is an array of shape (2, N, 2) where N is the number of ommatidia per eye.

stabilization_timefloat, optional

Time (in seconds) to wait before starting to render the video. This might be wanted because it takes a few frames for the position controller to move the joints to the specified angles from the default, all-stretched position. By default 0.02s