Fortran on the Dreamcast!?

Dreamcast

The Dreamcast is a video-game console released by Sega in 1998, and discontinued in 2001. At the time it competed against the PlayStation and XBox. Just over 9 million units were sold in this period, but there appears to be a strong indie/homebrew community, centered around the dreamcast.wiki website.

According to this tweet, it’s possible to run Rust and Fortran on the Dreamcast thanks to some of the recent work done on GCC:

I found a bit more information about the connection with GCC in this Reddit thread, showing that some C++20 can also run on the Dreamcast. The author of the tweet will also be giving a presentation at FOSDEM: FOSDEM 2024 - Sega Dreamcast Homebrew with GCC

Even if you don’t have a Dreamcast now (second hand ones go for 100-250 € on Ebay, in Germany at least), in principle it should be possible to develop programs and test them using emulators.

I haven’t looked any deeper into how to set-up the SDK or a cross-compiler, but it looks like an entertaining project once the initial hurdles are overcome.

SH-4 CPU

The Dreamcast was built on top of the the SH-4 processor (from Hitachi) using the SuperH instruction set, a 32-bit RISC ISA. Notable features of the SH-4 include:

  • FPU with four floating-point multipliers, supporting IEEE754 standard 32-bit single-precision and 64-bit double-precision floats (double precision is hardware emulated)
  • 4D floating-point dot-product operation and matrix–vector multiplication (useful for viewpoint changes, angle changes, or movements called vector transformations)
  • 128-bit floating-point bus allowing 3.2 GB/sec transfer rate from the data cache
  • 64-bit external data bus with 32-bit memory addressing, allowing a maximum of 4 GB addressable memory with a transfer rate of 800 MB/sec
  • Built-in interrupt, DMA, and power management controllers

More information can be found in the SH-4 CPU Core Architecture manual. The motivation for the floating point and matrix processing instructions are described in the paper Entertainment Systems and High-Performance Processor SH-4. I also found this presentation from Hot Chips in 1997.

You can see the processor at work in this demo disc that has been uploaded to YouTube.

PowerVR2 GPU

The Dreamcast uses the PowerVR2 GPU, manufactured by NEC under license of Imagination Technologies. It could draw more than 3 million polygons per second. I found a blog post explaining some of the novelties of the GPU at the time it was launched: Celebrating the 20th anniversary of Dreamcast and PowerVR - Imagination. The PowerVR product line was also used in iPhones until 2017. Note that the GeForce 256 which launched NVIDIA on to the PC gaming market was released in 1999.

8 Likes

It looks like cross-compilers are available for Debian and Ubuntu:

The C and C++ compilers at least, are supported on Compiler Explorer.

BTW the screenshot in that post reports a non-interoperable is_null function returning a Fortran logical that I’m not sure would compile with gcc 13! :man_shrugging:

The famous Hamilton’s quaternions?

They are used for rotations to avoid the gimbal lock, but I am not sure if the quoted line refers to them. BTW, there are also other options Let's remove Quaternions from every 3D Engine (An Interactive Introduction to Rotors from Geometric Algebra) - Marc ten Bosch

1 Like

You are correct!!! The SH4 arch is still actively maintained in the kernel, so we’re always checking in on what our Linux dev cousins are up to… That was actually how I initially even knew Fortran was still likely to cross-compile on Dreamcast.

Yes, the C and C++ GCC compilers for SH are up on Compiler Explorer… There’s a whole funny story behind that, and I maintain some easy “templates” you can use to quickly get configured for KOS/DC here: SH4 in Compiler Explorer - dreamcast.wiki

Within Rust it is called within an unsafe region, so I guess there is little or no type- and interface-checking. I assumes it works in that case because internally in gfortran, .true. maps to 1, and .false. maps to 0, which also matches the rules in Rust for bool to int casting

@Falco, Welcome to Discourse!

It would be nice to learn if the Fortran intrinsic functions like dot_product or matmul get mapped to the SH-4 instructions (at least for the single-precision real32 type). There are some older books on the topic of Fortran and Computer graphics, such as: High-resolution computer graphics using FORTRAN 77 : Angell, Ian O : Free Download, Borrow, and Streaming : Internet Archive

There are also OpenGL bindings for Fortran, such as this one (cc @Pap):

1 Like

So the hardware accelerated math is for generic vector/vector and vector/matrix operations, required to translate, scale, rotate, and orient basically every 3D polygon vertex onscreen you see getting rendered. The SH4 doesn’t accelerate quaternions, but you can represent quaternions as 4x4 rotation matrices, as mentioned here: Quaternions and spatial rotation - Wikipedia

The SH4 also has fast hardware-accelerated sin/cos approximations for trigonometry for populating the rotation matrices (does both sin + cos in one instruction), plus it features the famous “fast inverse square root” approximation that was popularized by John Carmack in a single instruction.

The reason it has all of this special fancy SIMD/trigonometric stuff is because, unlike modern GPUs, the DC’s PVR was not fully programmable and did not handle vertex transformations and calculations when doing 3D rendering… This meant the main SH4 CPU that was responsible for all of the other game logic, physics, AI, etc, also had to transform potentially millions of polygons per second worth of vertices… So these instructions were added to allow the SH4 to handle this task more efficiently without being bogged down doing it all in software.

1 Like

Wow, that’s really epic! Do you know what version of OpenGL this is? On Dreamcast, with our PVR GPU and KallistiOS, we’ve got pretty much the entirety of up to OpenGL v1.2 going on DC, which you should be able to call from Fortran similarly. We have a lower-level direct hardware API in KOS as well, but I normally tell people to go the OpenGL route, because it’s more familiar, is a prettier API, and you can find way more books and resources on it…

That is a VERY good question about the Fortran intrinsics… So I’m friends with the GCC SH back-end developer who has done some of these intrinsics. I’ll have to ask him how they’re implemented or map to Fortran… Unfortunately we don’t have the two instructions for dot product and vector * matrix multiplication exposed as an intrinsic, because Oleg (compiler back-end guy), said it was an insane amount of work with very little benefit, since GCC was typically not smart enough at the time to vectorize and use the intrinsics automatically anyway… Instead, it’s like a single line of inline assembly code in C to use these instructions, so that’s how we’ve been accessing them within the scene… Does Fortran do inline assembly to? If not, I’m sure it would be trivial to do it in C and call it from Fortran?

The intrinsics that we DO have in C and C++, though, are the fast sin/cos single instruction and the inverse square root approximation, which you can see me enable with the -fsrra and -fsca flags in the compiler explorer templates (along with -ffast-math)… I’m guessing none of this stuff is specific to C or C++ and is probably going to be implemented for Fortran as well?

I found the series of IEEE Micro articles from the original chip and console designers to provide a nice overview of it’s capabilities:

1 Like

The way gfortran works, it first lowers Fortran into some kind of intermediate C-like language. You can see the intermediate output using the flags:

$ gfortran -O2 -ftree-dump-original demo.f90
$ gfortran -O2 -ftree-dump-optimized demo.f90

For example on the program,

program demo
implicit none
real :: x
read *, x
print *, sin(x), cos(x)
end program

it dumps a file in the current folder containing,

;; Function demo (MAIN__, funcdef_no=0, decl_uid=4267, cgraph_uid=1, symbol_order=0) (executed once)

__attribute__((fn spec (". ")))
void demo ()
{
  real(kind=4) D.4273;
  real(kind=4) D.4272;
  struct __st_parameter_dt dt_parm.1;
  struct __st_parameter_dt dt_parm.0;
  real(kind=4) x;
  real(kind=4) x.3_1;
  real(kind=4) _2;
  real(kind=4) _3;
  complex(kind=4) sincostmp_27;

  <bb 2> [local count: 1073741824]:
  dt_parm.0.common.filename = &"demo.f90"[1]{lb: 1 sz: 1};
  dt_parm.0.common.line = 4;
  dt_parm.0.common.flags = 128;
  dt_parm.0.common.unit = 5;
  _gfortran_st_read (&dt_parm.0);
  _gfortran_transfer_real (&dt_parm.0, &x, 4);
  _gfortran_st_read_done (&dt_parm.0);
  dt_parm.0 ={v} {CLOBBER(eol)};
  dt_parm.1.common.filename = &"demo.f90"[1]{lb: 1 sz: 1};
  dt_parm.1.common.line = 5;
  dt_parm.1.common.flags = 128;
  dt_parm.1.common.unit = 6;
  _gfortran_st_write (&dt_parm.1);
  x.3_1 = x;
  sincostmp_27 = __builtin_cexpif (x.3_1);
  _2 = IMAGPART_EXPR <sincostmp_27>;
  D.4272 = _2;
  _gfortran_transfer_real_write (&dt_parm.1, &D.4272, 4);
  D.4272 ={v} {CLOBBER(eol)};
  _3 = REALPART_EXPR <sincostmp_27>;
  D.4273 = _3;
  _gfortran_transfer_real_write (&dt_parm.1, &D.4273, 4);
  D.4273 ={v} {CLOBBER(eol)};
  _gfortran_st_write_done (&dt_parm.1);
  dt_parm.1 ={v} {CLOBBER(eol)};
  x ={v} {CLOBBER(eol)};
  return;

}



;; Function main (main, funcdef_no=1, decl_uid=4274, cgraph_uid=2, symbol_order=1) (executed once)

__attribute__((externally_visible))
integer(kind=4) main (integer(kind=4) argc, character(kind=1) * * argv)
{
  static integer(kind=4) options.2[7] = {2116, 4095, 0, 1, 1, 0, 31};

  <bb 2> [local count: 1073741824]:
  _gfortran_set_args (argc_2(D), argv_3(D));
  _gfortran_set_options (7, &options.2[0]);
  demo ();
  return 0;

}

Note it uses Euler’s formula to compute the sine and cosine using the complex exponential in a single statement:

sincostmp_27 = __builtin_cexpif (x.3_1);

Hi all! I’m the author of that demo. I’m also the owner/administrator of the dreamcast.wiki website mentioned in the OP.

In the past year we have put much effort into updating and modernizing our toolchain, as a result of which we are now on GCC 13 and support modern C/C++ language features in our Dreamcast library OS, KallistiOS. We’ve also been enthusiastic about putting in the work to enable GCC support for other languages on Dreamcast. We have recently been able to add varying degrees of support for Objective-C, D, mRuby, and Micropython.

The primary reason for my demo’s existence is to demonstrate very early support for Rust on SuperH using the Rust-GCC project’s early-stage Rust compiler (the standard Rust compiler/tools are all centered around LLVM which does not support SuperH). When I say early-stage I mean not only is the standard library not yet supported, but neither is libcore, which means a large portion of the language is not working yet. It’s essentially an odd C-like dialect of Rust. Now, in Rust, pointer arithmetic can be done through casting (which seems to not work for pointers on Rust-GCC at this time), or through functions provided in Rust’s standard library. Since we can’t do any of that, I was left no choice but to call another language to do it.

The obvious easy thing to do is to just wrap a C function, but I wanted to use zero C in this demo (other than our KallistiOS library and OpenGL library, of course). Fortran was one of the remaining languages GCC supports that we have not yet done anything with, as no one I know in our community knows or uses Fortran, including myself, so I decided to explore if I could build a working Fortran Dreamcast compiler (and if Fortran could even do what I wanted from it!).

I was able to build a toolchain using our KallistiOS GCC build scripts provided I made the following concessions:

  • Use Newlib 3.3.0 instead of 4.3.0 or 4.4.0, as some headers are needed by gfortran which were deprecated and removed in 4.x.
  • I disable m4-single-only (single precision only floating point support) as there are some functions that expected double precision.
  • I did not include KallistiOS’s patches against GCC/Newlib in this compiler.

I then altered our Makefile so that f90 files are compiled to objects separately using this compiler. The rest of the project is compiled and linked using the toolchain built from GCCRS (Rust-GCC) sources.

Since I do not know Fortran at all, I asked ChatGPT to generate some suitable Fortran functions for me, but it was pretty messy and wouldn’t compile no matter how I tried to coerce it to create valid, working code. But it was enough for me to follow along and learn how Fortran and C interoperability in Fortran should work, and so I was able to successfully write those functions.

2 Likes

Hi @darcagn, welcome to Fortran Discourse! Nice project!

There are notes about this in the gfortran manual: Interoperability with C (The GNU Fortran Compiler). It works very well IMO, although it can be verbose at times.

Are these changes made in the .../dc-chain/config.mk?

The Fortran standard requires that processors (Fortran terminology for compilers) provide real and double precision by default, so I guess most compilers are configured that way. But there could be another reason.

1 Like

The changes to use Newlib 3.3.0 are in the config.mk.stable.sample configuration file you must copy to the dc-chain directory before initiating make. You must also change the three options in that config file to disable KOS patches, disable Newlib patches, and use single threaded model.

To build the toolchain without m4-single-only you must edit this line in dc-chain/scripts/build.mk from:
$(build_sh4_targets): extra_configure_args = --with-multilib-list=m4-single-only --with-endian=little --with-cpu=m4-single-only
to:
$(build_sh4_targets): extra_configure_args = --with-multilib-list=m4,m4-single --with-endian=little --with-cpu=m4