Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: int/uint portability to 16-bit CPUs #161

Closed
wants to merge 4 commits into from
Closed

RFC: int/uint portability to 16-bit CPUs #161

wants to merge 4 commits into from

Conversation

1fish2
Copy link

@1fish2 1fish2 commented Jul 12, 2014

Both Issue #14758 and Issue #9940 call for RFCs.

This RFC summarizes those discussions, explains the core issue of
code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs),
explains what's meant by "default" integer types, makes 2 specific
proposals, and proposes coding style for integer sizing.

Both Issue #14758 and Issue #9940 call for RFCs.

This RFC summarizes those discussions, explains the core issue of
code portability to 16-bit CPUs (also of 64-bit code to 32-bit CPUs),
explain what's meant by "default" integer types, makes 2 specific
proposals, and proposes usage style for integer sizing.

# Background

Rust defines types `int` and `uint` as integers that are wide enough to hold a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even true on 16-bit devices, or do modern ones still use a segmentation system? Are there any relevant 16-bit chips anymore?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XMEGA are 8/16-bit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Atmel AVR controllers http://en.wikipedia.org/wiki/Atmel_AVR and some PIC controllers http://en.wikipedia.org/wiki/PIC_microcontroller have 16-bit address spaces. These tend to have Harvard architectures, that is, separate instruction and data memory/addresses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the MSP430...

@errordeveloper
Copy link

Sounds reasonable.


# Drawbacks

- Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some people will just end-up redefining the int and uint to be 32-bit in their projects...

@errordeveloper
Copy link

Overall, it's quite a reasonable thing to do, considering Rust's goals.

Although, may be the motivation and title could be generalised a bit more...


# Motivation

So Rust libraries won't have new overflow bugs when run on embedded devices with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just replace the entier paragraph with: "Avoid bugs where programmer presumed default integer size for indexing of arrays and eleswhere."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can expand a little to just say that this concerns non-32 bit targets, mostly, 8-bit and 16-bit MCUs and, to some extend 64-bit CPUs too.

@errordeveloper
Copy link

Am I correct in understanding that this is to keep the integral type used for array indexing default to "native", i.e. fastest, integer? (On the AVR the int32_t is actually slow, not that I care to much about the AVR, just saying).

By the way, what about suffixes? Would this imply dropping of i and u suffixes?

@huonw
Copy link
Member

huonw commented Jul 12, 2014

Am I correct in understanding that this is to keep the integral type used for array indexing default to "native", i.e. fastest, integer?

The integral type used for indexing is the smallest one that covers the address space. "fastest"/"native" is irrelevant.

@errordeveloper
Copy link

@huonw thanks for the better formulation, perhaps that's the way the RFC/docs should state it.

@1fish2
Copy link
Author

1fish2 commented Jul 12, 2014

Agreed. I'll rephrase that.

@1fish2
Copy link
Author

1fish2 commented Jul 12, 2014

Yes, the motivation and title can be generalized. I was trying to start
with a concrete reason rather than a return of the intptr/uintptr
discussion.

@dobkeratops
Copy link

as well as embedded, some machines have had coprocessors with smaller address spaces.. not so common now, but who knows what the future will bring

My suggestions would have been ...

[1] Officially define int/uint as max( pointer-size, alu-size, 32bits) .. as most people expect: 32/64
This will prevent unexpected bugs when you move from desktop to embedded, and is friendly to most.

[2] Then add other types which are more specific..
perhaps word=max(pointer-size,alu-size) .. might be 16/32/64
maybe even another for min(pointer-size,alu-size,32bits) .. might be 16/32

These are complimentary to the specific types,i32 etc.. code might cluster data dynamically to suit its platform.
The embedded aware programmer switches from int->word as an optimisation.

[3] Vec could be defined more versatile as Vec<T,Index=uint> - in the vast majority of my cases I'd be using Vec<T,i32> - even on a 64bit machine - they're sufficient on machines with up to ~16gb ram, (especially with the majority of ram is full of graphical assets.)
Similarly embedded people could reuse Vec<T,word>..

           68000      x86   x86-64      x32
int        32         32    64          32
word       16         32    64          64
??         16         32    32          32       

Seems like the C name 'long' being distinct from int is actually useful, maybe even swapping int out as suggested by the OP would be good, but adding another complimentary type would be less disruptive it think.

* Crisper/broader motivation.
* "The smallest integers that span the address space" is clearer than
"pointer-sized integers".
* More concise.
* More "not in scope" items.

> In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this. ...
>
> Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion makes a lot of sense in a context where overflow/underflow silently wraps around. However, if something like RFC PR #146 were to be implemented, then it would once again make sense to use types which more accurately express the range of legal values (i.e., which are self-documenting), because compiler-added checks can be enabled to catch errors where the value would go out of range. Accurate types with compiler-added assertions beats inaccurate types with programmer-added assertions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaebhoerl So would you recommend we wait for PR #146 to be accepted or rejected before evaluating this RFC further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. This was just an ancillary remark on an ancillary part of the proposal. The main part of the proposal (which is about changes to the language to better accomodate [portability to] 16-bit architectures) is unaffected.

(And anyway, the suggestion makes sense in the context of the current language, and the style guide could just be updated again if the language changes.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Nice insight, @glaebhoerl.

I'll make the style guide recommendation conditional on overflow-checking.

Q. Does/will overflow checking happen during conversion between integer types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A. It doesn't currently, but in the context of #146, if #[overflow_checks(on)], I think it should.

Rationale: As far as I can tell as is meant to preserve meaning rather than representation, e.g. 5000i32 as f32 is equivalent to 5000f32 and not to transmute::<i32, f32>(5000i32). Therefore if attempting to transport the meaning of the original value to the target type causes it to overflow, it should be caught.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise computing a value in one integer type then converting to another would accidentally bypass the overflow checks.

@errordeveloper
Copy link

Also, another point this RFC should consider is how would a typical for i in range(..) construct would look like... Well, considering int as well as uint would be scraped, then an integer literal with suffix i or u would mean index or uindex or not?

* Recommended unsigned or signed integer types for numbers that
should not be negative -- depending on whether Rust provides integer
overflow checking.

* Crisper integer style guideline section.
@Thiez
Copy link

Thiez commented Jul 14, 2014

@errordeveloper I doubt that would be a problem because in most cases one would iterate over an indexable collection directly rather than indexing (and paying for bounds checking). Not that I support this RFC...

@Ericson2314
Copy link
Contributor

There should be some integer type that corresponds to pointer size. That is why I like intptr/uintptr much more than just adding an arbitrary 32-bits minimum.

There could be some fancy macro that you give constraints (fastest / smallest, max abs val, signed/unsigned, etc) and it spits out a type or aborts compilation. This seems more versatile and less namespace-cluttering than C99's solution.

BTW, last I checked rust let you transmute int/uint to whatever fixed-size integers type fit for the current build target. This should be arguably disallowed.

I would love some infrastructure everybody could share to do continuous integration with different int sizes. This probably necessitates virtualizing different CPU architectures (because int--ptr transmutations), but it would be cool if it didn't.

I initially didn't think compiler-added overflow checks was too important. But if that is what it takes to make people use unsigned integers for natural numbers, I am all for it.

@huonw
Copy link
Member

huonw commented Jul 15, 2014

BTW, last I checked rust let you transmute int/uint to whatever fixed-size integers type fit for the current build target. This should be arguably disallowed.

Trying to protect against everything that can change per platform/configuration is impossible. e.g.

#[cfg(windows)]
struct Foo { x: u8 }
#[cfg(not(windows))]
struct Foo { x: u16 }

transmute::<Foo, u8>(...)

@Ericson2314
Copy link
Contributor

Impossible I think not.

I'd like to some how match on a list of archs one attempts to support, lest one forget a case, rather than just config-chaining, and hoping for the best. This shouldn't be to hard.

More radically, for the purposes of type checking it would be nice to take an intersection intersection type or something analogous: e.g:

// can't be transmuted / unique size,
// implements all traits that both u8 and u16 do.
type Magic = u8u16

struct Foo { x: Magic }

This is kind of "mangling of phases", and a rather big step from the way things work currently. The alternative is just to part of compilation brute-force the various configuration options, or just cross compile and virtualize as I said before.

@alexchandel
Copy link

Given the purpose of the int and uint types, to be large enough to hold any memory address on the target machine, the intptr/uintptr names seem appropriate. Paralleling the C standard is another advantage, since they would be familiar to a significant target community. Imposing an arbitrary 32 bit minimum would be inconsistent with the purpose of the type, and would waste memory on targets like 8-bit and 16-bit PICs.

@l0kod
Copy link

l0kod commented Aug 15, 2014

In rust-lang/rust#9940, @thestinger said:

I just don't think this issue has any real benefit beyond painting the bikeshed a bit more to my liking. It's not worth the backwards compatibility break at this point.

I think renaming int and uint worth the backwards compatibility break because it will be the (only) occasion to check for good int use in existing code before they create bugs…

@lilyball
Copy link
Contributor

int and uint have two purposes:

  1. Be sized properly to covert the address range for the architecture.
  2. Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

Claiming that purpose 1 is the only purpose for these types is wrong, and yet that's the motivation for renaming to intptr / uintptr.

The only real issue with int/uint right now is that on 16-bit machines, these types are probably 16-bit types, and that's probably unexpectedly small. There is a lesser issue with code that is written on a 64-bit machine overflowing on a 32-bit machine, but anyone who's using values larger than 32 bits should recognize that this is so and deal with it appropriately.

Importantly, renaming these to intptr/uintptr is not going to solve the issue of code going from a 64-bit to a 32-bit machine. Even if we encourage the use of e.g. i32 as the de-facto type, people will still be using intptr/uintptr a lot, because that's the type used for indexing and sizing of containers. If anything, encouraging the use of i32 (or i64) is only going to make things worse as people sprinkle as i32 and as uintptr all over their code whenever they work with containers. In this scenario, someone writing code on a 32-bit machine might be fine, but on a 64-bit machine they may end up with intptr/uintptr values that are >32 bits, and the as i32 overflows. Or perhaps the code is run on a 16-bit machine and the as uintptr now overflows.

Basically, renaming these types does not really do anything at all for overflow, it just encourages people to add more unchecked integral casts to their code.

Because of this, the only approach I can support is keeping int/uint but defining them as being at least 32 bits. Using 32-bit integers on a 16-bit machine doesn't seem like it should be particularly problematic; if int becomes intptr people are just going to end up using 32-bit or 64-bit integers instead anyway. Alternatively, we could just ignore the issue and assume anyone using 16-bit machines is basically writing custom code anyway (or using libraries that explicitly support 16-bit machines).

@l0kod
Copy link

l0kod commented Aug 15, 2014

Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

For a static typing language, the int and uint are kind of weird because of their dynamic/unknown/architecture-related size. This particularity should be highlighted.

For this reason, I don't think it's a good idea to promote the int nor uint as a de-facto integer type. Does a programmer need to ask itself good questions for choosing every type except an integer type?
The #115 and rust-lang/rust#6023 are not in that direction:

You should only use a pointer-size integer if it's actually what you need. You can't use a fixed-size integer without thinking about the bounds, so a pointer-size integer is a bad fallback.

Obviously, the architecture-related integer is needed for memory-related access (i.e. indexing and sizing of containers). Is there a good reason for hiding the initial goal and bug-prone (e.g. cast) property of a type?

Basically, renaming these types does not really do anything at all for overflow, it just encourages people to add more unchecked integral casts to their code.

That's a possibility, but if they are aware of the architecture-related property they have more reasons to do the right choice: to choose the right type everywhere.
The index or intptr are more meaningful than the simple int which, for many languages, do not denote any memory-related notion.

If that make sense, the "at least 32-bit" exception is not needed. Moreover it would introduce another weird rule to this already weird type.

@huonw
Copy link
Member

huonw commented Aug 15, 2014

Be the de-facto integral type to use when the programmer doesn't really care, typically because they're using numbers that are small enough that 32-bit vs 64-bit doesn't matter.

This isn't really the case, it's just using any other types is annoying and historically unfavoured (since we had default-to-int functionality previously, and uint and int get the nice short u/i suffixes). Defaulting to the int/uint type also has downsides with respect to memory use: most numbers are small, so the 32-bit extra bits of int vs. i32 (on 64-bit platforms) is a complete waste.

@1fish2
Copy link
Author

1fish2 commented Aug 15, 2014

the architecture-related integer is needed for memory-related access (i.e. indexing and sizing of containers).

Alternatively, declare each array's index type rather than using an architecture-dependent type that spans the address space.

@1fish2
Copy link
Author

1fish2 commented Nov 7, 2014

Good plan. Would you like me to withdraw this PR and submit a new PR to rename int/uint and select i32 as the fallback type?

And to be sure I have it precisely right, "fallback" means both the type inference default for integer literals and the recommended programmers' go-to type?

@thestinger
Copy link

@1fish2: Yeah, I think a new RFC with that scope would have a high chance of success.

And to be sure I have it precisely right, "fallback" means both the type inference default for integer literals and the recommended programmers' go-to type?

Yeah, the type inference default (which was accepted again with https://github.com/rust-lang/rfcs/blob/master/text/0212-restore-int-fallback.md) which is essentially the type that the language is recommending as a good default.

@l0kod
Copy link

l0kod commented Nov 7, 2014

For bikeshed discussion about new int/uint names see http://discuss.rust-lang.org/t/if-int-has-the-wrong-size/454
Seems like the isize/usize is favored.

@thestinger
Copy link

Calling them isize / usize wouldn't really be correct. The maximum object size (including for arrays) needs to be capped 1 bit lower than the pointer size. The types are defined as having the same number of bits as pointers, not as a way of measuring sizes.

@Thiez
Copy link

Thiez commented Nov 7, 2014

Perhaps a bit offtopic, but suppose we decide to stop using int and uint for anything unrelated to indexing and pointers. What exactly is the advantage of having int at all? Why not have only uint? If int is dropped then so is the silly requirement that the maximum object size needs to be capped.

@netvl
Copy link

netvl commented Nov 7, 2014

@Thiez, isn't it there to represent a difference between pointers? You can't have it without a sign.

@Thiez
Copy link

Thiez commented Nov 7, 2014

Sure you can. Suppose we have a machine with 256 bytes of memory, so size_of::<uint>() == size_of::<int>() == 1. We have two pointers, represented as uints: p = 100u; q = 200. What is the difference between p and q? let (diffpq, diffqp) = (q - p, p - q); Then, by virtue of unsigned integer wraparound, we have p + diffpq == q and q + diffqp == p. If for whatever reason we wish to know if p < q, we should just use that test, rather than checking if diffpq > 0.

@tbu-
Copy link
Contributor

tbu- commented Nov 7, 2014

I'm currently working on a draft on changing the default fallback type to i32 (Link to the branch).

@Thiez This is indeed offtopic, I don't think it's helping the RFC.

@1fish2
Copy link
Author

1fish2 commented Nov 7, 2014

OK. I'll do that in a couple days and let you review it before sending the PR.
Do you want to jointly author it?

@errordeveloper
Copy link

Have we thought of just adding a lint warning when the type in question is
used for anything other then indexing?

@tbu-
Copy link
Contributor

tbu- commented Nov 7, 2014

@errordeveloper If it's used for indexing it's already automatically inferred to be a uint. Also, you can't index an array using int.

@Ericson2314
Copy link
Contributor

@thestinger if indexing is done with uint, is there any problem with 32-bit processes on 64-bit machines? I do agree we should call them something along the lines of uptr, iptr, and make i32 the default (cause nobody will "program in the small" on 16-bit machine).

@thestinger
Copy link

@Ericson2314: There's no problem in terms of uint with 32-bit processes on 64-bit machines. It can still address every byte in the address space. The maximum positive int value needs to be an upper bound on object and dynamic array size in order to keep offset well-defined.

@thestinger
Copy link

@Thiez: Pointer arithmetic is inherently signed because it can go in both directions, not unsigned. It is not well-defined to overflow normal (fast) pointer arithmetic.

@Ericson2314
Copy link
Contributor

@thestinger so negative ptr offsets are an essential thing to support?

@thestinger
Copy link

@Ericson2314: Yes, being able to calculate pointer differences and do negative offsets is an essential feature. Ensuring correctness requires limiting the maximum object size to int::MAX. LLVM also only offers fast pointer arithmetic via signed offsets, so it's important in the short term from a performance point of view.

@Ericson2314
Copy link
Contributor

@thestringer Ok, I'm sold. Especially given the performance aspect.

@1fish2
Copy link
Author

1fish2 commented Nov 13, 2014

The new, simpler draft RFC to replace the present one is at 0000-int-name.md.

Comments?

@thestinger
Copy link

@1fish2: It looks great to me.

@l0kod
Copy link

l0kod commented Nov 13, 2014

@1fish2: great!

I would also add the argument that the renaming process would be the good and probably only time to spot future bugs before they appear.

There is also the question about integer suffixes i and u. They must change (or disappear) accordingly to the int/uint, otherwise the bugs related to this names will surely remains in the existing code.

A good example of using uindex and not index should help too: object indexing and object length (not sure anymore)?

@1fish2 1fish2 mentioned this pull request Nov 13, 2014
@1fish2
Copy link
Author

1fish2 commented Nov 13, 2014

Excellent points, Mickaël.

I just sent the PR. Do you want to add these points there? We'll continue the discussion there.

@1fish2
Copy link
Author

1fish2 commented Nov 13, 2014

I propose to withdraw this RFC in favor of the single-purpose RFC: Renaming int/uint (PR #464).

@errordeveloper
Copy link

On 13 November 2014 08:46, Jerry Morrison notifications@github.com wrote:

I propose to withdraw this RFC in favor of the single-purpose RFC:
Renaming int/uint (PR #464 #464).

Makes sense.

@emberian
Copy link
Member

@1fish2 you have the power to close it :)

@brson brson assigned nrc and unassigned brson Nov 13, 2014
@1fish2 1fish2 closed this Nov 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.