Best practise to deal with incompatible usage of intergerp

Discussion:

Stephen J. Turnbull

2015-07-15 12:16:35 UTC

I find, in nxml code integerp is used to determind if its argument is a
single character.

Aidan may have a better idea, but I would guess that's a hangover from
the original code, which was designed for GNU Emacs, and never worked
perfectly in XEmacs. GNU Emacs still does not have a character type,
only integers and strings.

If you want to know exactly when a line was changed and how, use hg
annotate.

Aidan Kehoe

2015-07-15 19:23:05 UTC

Permalink

Hi FKtPp --

To deal with your last question first, the integerp change dates from XEmacs
20.0, in the mid-1990s. It is a design decision of most modern programming
languages that it is good and useful to be able to tell at runtime whether a
given object is a character, and thus should be displayed as ?a, ?á, ?گ or ?
南, or whether it is an integer, and thus should be displayed as a numeric
value.

Of the major languages in current use, C and friends don’t make the
distinction, and nor does GNU Emacs Lisp. XEmacs added it because emacs Lisp
is already, compared to C, so slow that the performance impact (checking for
whether something is an integer vs. a character) doesn’t matter, and because
it is very useful for interactive use and for debugging in this *text*
editor. Common Lisp, the other big influence on XEmacs, also has this
distinction; MacLisp, the Lisp that both GNU Emacs Lisp and Common Lisp are
ultimately based on, did not.

The Right Way to handle this kind of incompatible integerp usage is to
decide “is the code interested in this value as an integer, or as a
character?” and, if the latter, to replace #'integerp calls with
#'characterp. If it’s very difficult to tell if the writer is treating the
value as a character or as an integer, consider #'char-or-char-int-p.

The following comment from xsd-regexp.el suggests that it is most interested
in characters:

;;
;; The semantics of XSD regexps are defined in terms of Unicode.
;; Non-Unicode characters are not allowed in regular expressions and
;; will not match against the generated regular expressions. A
;; Unicode character means a character in one of the Mule charsets
;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
;; or a character translateable to such a character (i.e a character
;; for which `encode-char’ will return non-nil).

If it were interested in *Unicode values*, then yes, integerp would still be
appropriate, but there it’s interested in characters. Another hint that the
code is interested in the charater value is when they are compared with
something with character syntax, e.g. like (eq value ?a) or (memq ?b list).
As far as I can see xsd-regexp.el would be fine with characterp wherever
integerp is used; if you’ve any questions about other files, get in touch.

Best,

Aidan

Hi Aidian, Stephen,
I find, in nxml code integerp is used to determind if its argument is a
single character.
To make the code work I changed this kind of intergerp call to characterp.
Anyway nxml still mix use of character and integer type in someother logic.
such as #'xsdre-range-list-to-char-alternative which will result a C level
argument validation error of intergerp.
My question is: how was this kind of incompatible usage of integerp usage
handled in the past? Are there any good practice to follow?
Would you mind tell the history of intergerp change if possiable? When was
it change to be incompatible with FSF's one? and why.
Thanks,
Kai

--
‘Tramadol is further fed to cattle […] when working them […] (as draft
animals) so that the animals do not get tired quickly. …’
— Angewandte Chemie, Sept 2014, describing the social context of
(synthetic) tramadol having been found in Cameroon tree roots.

It's me FKtPp ;)

2015-07-16 07:10:55 UTC

Permalink

Thank you Aidan and Stephen for your quick response, I've just composed a
patch for the issue as shown in the following link:

https://bitbucket.org/m_pupil/nxml-mode-xemacs/commits/864b4b65bc9c00f95aa8c2e15e05193f2014e940

Thanks,
Kai

Post by Aidan Kehoe
Hi FKtPp --
To deal with your last question first, the integerp change dates from XEmacs
20.0, in the mid-1990s. It is a design decision of most modern programming
languages that it is good and useful to be able to tell at runtime whether a
given object is a character, and thus should be displayed as ?a, ?á, ?گ or ?
南, or whether it is an integer, and thus should be displayed as a numeric
value.
Of the major languages in current use, C and friends don’t make the
distinction, and nor does GNU Emacs Lisp. XEmacs added it because emacs Lisp
is already, compared to C, so slow that the performance impact (checking for
whether something is an integer vs. a character) doesn’t matter, and because
it is very useful for interactive use and for debugging in this *text*
editor. Common Lisp, the other big influence on XEmacs, also has this
distinction; MacLisp, the Lisp that both GNU Emacs Lisp and Common Lisp are
ultimately based on, did not.
The Right Way to handle this kind of incompatible integerp usage is to
decide “is the code interested in this value as an integer, or as a
character?” and, if the latter, to replace #'integerp calls with
#'characterp. If it’s very difficult to tell if the writer is treating the
value as a character or as an integer, consider #'char-or-char-int-p.
The following comment from xsd-regexp.el suggests that it is most interested
;;
;; The semantics of XSD regexps are defined in terms of Unicode.
;; Non-Unicode characters are not allowed in regular expressions and
;; will not match against the generated regular expressions. A
;; Unicode character means a character in one of the Mule charsets
;; ascii, latin-iso8859-1, mule-unicode-0100-24ff,
;; mule-unicode-2500-33ff, mule-unicode-e000-ffff, eight-bit-control
;; or a character translateable to such a character (i.e a character
;; for which `encode-char’ will return non-nil).
If it were interested in *Unicode values*, then yes, integerp would still be
appropriate, but there it’s interested in characters. Another hint that the
code is interested in the charater value is when they are compared with
something with character syntax, e.g. like (eq value ?a) or (memq ?b list).
As far as I can see xsd-regexp.el would be fine with characterp wherever
integerp is used; if you’ve any questions about other files, get in touch.
Best,
Aidan

Hi Aidian, Stephen,
I find, in nxml code integerp is used to determind if its argument is a
single character.
To make the code work I changed this kind of intergerp call to

characterp.

Anyway nxml still mix use of character and integer type in someother

logic.

such as #'xsdre-range-list-to-char-alternative which will result a C

level

argument validation error of intergerp.
My question is: how was this kind of incompatible usage of integerp

usage

handled in the past? Are there any good practice to follow?
Would you mind tell the history of intergerp change if possiable? When

was

it change to be incompatible with FSF's one? and why.
Thanks,
Kai