Discussion:
GT.M V4.4-003 available
(too old to reply)
bhaskar
2003-11-03 15:51:46 UTC
Permalink
GT.M V4.4-003 has been released. Updated binaries, documentation (new
Programmers Guide and Messages and Recovery Procedures Manual) and
source code are available for download
(http://sourceforge.net/projects/sanchez-gtm), as are release notes
(http://sourceforge.net/docman/display_doc.php?docid=19727&group_id=11026).
For those who have purchased support from Sanchez, you can download
software and documentation from the Sanchez FTP site. Contact your
normal support channel if you need the current password.

The most significant enhancement for this release is functionality to
permit the creation of UNIX shared libraries containing object modules
generated by GT.M on IBM pSeries AIX and HP PA-RISC HP-UX platforms,
extending to those platforms functionality that was previously
available on HP Alpha/AXP Tru64 UNIX. [Note: on HP Alpha/AXP
OpenVMS, object code generated by GT.M can be used to build standard
OpenVMS shared, installable, executable images.]

For users of GT.M on x86 Linux, the main benefits are performance
enhancements and bug fixes. In particular, some bugs reported on
Source Forge have been fixed, as well as a bug affecting users of some
versions of VistA, in which a SET $PIECE statement with a $SELECT in
the second (delimiter) argument gave undefined results (and for which
a temporary fix was previously released with GT.M V4.4-FT01).
Improvements and bug fixes submitted by the GT.M open source user
community are also included. Details are in the release notes.

Please upgrade to this version of GT.M, and as always, please provide
us with your feedback. Thank you very much.

-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
bhaskar
2003-11-04 19:44:45 UTC
Permalink
Having released some significant enhancements to GT.M in the last few
months in V4.4-002 and V4.4-003, we are in the process of prioritizing
the next round of GT.M development (enhancements, misfeature
corrections and bug fixes). This is an opportunity for the community
to tell us what you would like to see in GT.M in the future. What
would you like to have in GT.M that you don't have today? You can
either post a response here on comp.lang.mumps, or you can e-mail me
privately at k dot bhaskar at sanchez dot com (I will personally
acknowledge every e-mail on the topic, so in case you don't hear from
me, it means that your e-mail didn't get through, possibly trapped by
some over-zealous spam filter).

If you prefer to phrase your response in terms of what you like and
don't like about GT.M, that too would be valuable.

Thanx in advance for your comments. They will help us make GT.M your
favorite M implementation.

-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by bhaskar
GT.M V4.4-003 has been released.
[KSB] <...snip...>
Denver Braughler
2003-11-05 04:19:28 UTC
Permalink
What would you like to have in GT.M that you don't have today?
I don't use GT.M, but I have been watching it since its early incarnation
on VMS.

I find languages more useful as arbitrary limits are removed.
... in GT.M, variable names can be up to 8 characters long.
If you raise that to 31 characters, you'll comply with the old standard.

These days (unlike the late 1970s) many people seem to agree that
8 is not enough.
[No, I didn't watch that TV show.]

Routine names and tags should be allowed to be up to 31 characters too.
The total length of a global variable value is the block size,
minus the block header minus the variable name and subscripts.
Since a block can be a maximum of 65,024 bytes,
That sounds like a problem.

Although raising your blocksize might increase your overall database size,
which would be good, and having room to grow sounds good too, that doesn't
really solve the problem.
(25 years ago, 32 Kb was a tremendous amount of RAM.
Disk blocks were 512 bytes?)

GT.M needs a way to store values that can span blocks.

If a singe node in the global database could have a $L() in the header
that is five bytes long, the maximum length would be around a trillion bytes.

Obviously this cannot be achieved by merely increasing the block size.
the maximum value of a global is somewhere around 65,000 bytes.
The value of a local variable can be up to 1MB.
I very much dislike this mismatch (though it is not unique to GT.M).

If I can store it in a local variable, I should be able to store it
in a global.
A variable can have 32 subscripts.
I remember having an application that used twelve.
But I could have made do with fewer.
So that sounds good enough to me.

But there might be a network problem or game that could use more.
So if it is pretty arbitrary, go ahead and raise it.
I don't know how you implement subscripts.
The way I see it is: if it's not over the total length, who cares
how many commas there are?
Rob Tweed
2003-11-05 10:29:51 UTC
Permalink
Long variable names and labels/routine names are a must.

Also TCP transport that allows multiple connections via the same port
will make life a lot easier

Rob

On Tue, 04 Nov 2003 23:19:28 -0500, Denver Braughler
Post by Denver Braughler
What would you like to have in GT.M that you don't have today?
I don't use GT.M, but I have been watching it since its early incarnation
on VMS.
I find languages more useful as arbitrary limits are removed.
... in GT.M, variable names can be up to 8 characters long.
If you raise that to 31 characters, you'll comply with the old standard.
These days (unlike the late 1970s) many people seem to agree that
8 is not enough.
[No, I didn't watch that TV show.]
Routine names and tags should be allowed to be up to 31 characters too.
The total length of a global variable value is the block size,
minus the block header minus the variable name and subscripts.
Since a block can be a maximum of 65,024 bytes,
That sounds like a problem.
Although raising your blocksize might increase your overall database size,
which would be good, and having room to grow sounds good too, that doesn't
really solve the problem.
(25 years ago, 32 Kb was a tremendous amount of RAM.
Disk blocks were 512 bytes?)
GT.M needs a way to store values that can span blocks.
If a singe node in the global database could have a $L() in the header
that is five bytes long, the maximum length would be around a trillion bytes.
Obviously this cannot be achieved by merely increasing the block size.
the maximum value of a global is somewhere around 65,000 bytes.
The value of a local variable can be up to 1MB.
I very much dislike this mismatch (though it is not unique to GT.M).
If I can store it in a local variable, I should be able to store it
in a global.
A variable can have 32 subscripts.
I remember having an application that used twelve.
But I could have made do with fewer.
So that sounds good enough to me.
But there might be a network problem or game that could use more.
So if it is pretty arbitrary, go ahead and raise it.
I don't know how you implement subscripts.
The way I see it is: if it's not over the total length, who cares
how many commas there are?
---
Rob Tweed
M/Gateway Developments Ltd

Global DOMination with eXtc : http://www.mgateway.tzo.com
---
bhaskar
2003-11-05 14:29:31 UTC
Permalink
Denver and Rob --

Thank you for your thoughtful comments. They are insighful and valuable.

Regards
-- Bhaskar
Jim Self
2003-11-23 01:20:44 UTC
Permalink
Post by Denver Braughler
What would you like to have in GT.M that you don't have today?
I don't use GT.M, but I have been watching it since its early incarnation
on VMS.
I find languages more useful as arbitrary limits are removed.
... in GT.M, variable names can be up to 8 characters long.
If you raise that to 31 characters, you'll comply with the old standard.
These days (unlike the late 1970s) many people seem to agree that
8 is not enough.
[No, I didn't watch that TV show.]
Routine names and tags should be allowed to be up to 31 characters too.
I have been working with 8 character names long enough now that I don't
feel the limitation. It did force us to rename some routines in
converting them from DTM.
Post by Denver Braughler
The total length of a global variable value is the block size,
minus the block header minus the variable name and subscripts.
Since a block can be a maximum of 65,024 bytes,
That sounds like a problem.
I agree that an unlimited string size for globals seems like a desirable
feature if it does not detract from performance or stability. However, I
think of the limitation more as a minor annoyance or a slight
complication than as a serious problem since it does not pose a
practical limitation on the overall size of globals.

We have adapted our applications to work with 4KB blocks and a 32KB
limit on string length in local variables. The limit on the length of
strings in globals is close to what we had with DTM plus the overall
limits on the size of globals and the limits on local data in GT.M are
much more relaxed than what we had with DTM.

When we first started working with GT.M on Linux 3 years ago we tried
setting the block size to 8KB or 16KB. We were unsuccessful in getting
the larger block sizes to work correctly at that time and settled on the
4KB block size. I think I recall that issues with the larger block sizes
were resolved some time ago but we have not tried them since to confirm
that.

Applications can work with no special modifications for data values up
to the limits of local variables as long as you break the data up into
smaller chunks for storage. The biggest problem with logical data items
that exceed the limits for local variables is that they can't be passed
or returned as individual values by functions or MUMPS operators.
Post by Denver Braughler
Although raising your blocksize might increase your overall database size,
which would be good, and having room to grow sounds good too, that doesn't
really solve the problem.
(25 years ago, 32 Kb was a tremendous amount of RAM.
Disk blocks were 512 bytes?)
GT.M needs a way to store values that can span blocks.
Berkeley DB (http://sleepycat.com) has such a feature and offers
essentially unlimited string size. Berkeley DB has an excellent
reputation as a low level portable database layer for applications that
require speed and scalability. It's Btree access method appears to be an
ideal starting point for implementing MUMPS globals from scratch.

Back before GT.M was released as Free Software, we used Berkeley DB to
implement MUMPS globals for Perl in MontyPERL. Unfortunately, it was
MUCH slower than any MUMPS implementation we were able to test.

Perhaps some of the techniques used in Berkeley DB could be applied to
GT.M, but I wouldn't want it if it couldn't be implemented without loss
of performance where we have come to expect it.

Kevin O'Kane's MUMPS-to-C project includes an implementation of MUMPS
globals on top of Berkeley DB (as one option of several). His
implementation could be used to confirm the speed difference we observed
or perhaps to show what we did wrong in MontyPERL.

If the speed difference is as significant as it appeared and not just an
artifact of something in MontyPERL, it would be good for the reputation
of MUMPS to have such a comparison confirmed and published.
Post by Denver Braughler
If a singe node in the global database could have a $L() in the header
that is five bytes long, the maximum length would be around a trillion bytes.
Obviously this cannot be achieved by merely increasing the block size.
the maximum value of a global is somewhere around 65,000 bytes.
The value of a local variable can be up to 1MB.
I am looking forward to relaxing the 32KB string limit of previous
versions of GT.M, but so far all of our applications work comfortably
inside it - almost all of the time. It used to be that we had a limit of
around 32KB for the sum of all local variables. ;)
Post by Denver Braughler
I very much dislike this mismatch (though it is not unique to GT.M).
If I can store it in a local variable, I should be able to store it
in a global.
I do agree that the simpler conceptualization is attractive.
Post by Denver Braughler
A variable can have 32 subscripts.
I remember having an application that used twelve.
But I could have made do with fewer.
So that sounds good enough to me.
But there might be a network problem or game that could use more.
So if it is pretty arbitrary, go ahead and raise it.
I don't know how you implement subscripts.
The way I see it is: if it's not over the total length, who cares
how many commas there are?
Denver Braughler
2003-11-23 03:48:25 UTC
Permalink
Post by Jim Self
Post by Denver Braughler
... in GT.M, variable names can be up to 8 characters long.
If you raise that to 31 characters, you'll comply with the old standard.
Routine names and tags should be allowed to be up to 31 characters too.
I have been working with 8 character names long enough now that I don't
feel the limitation. It did force us to rename some routines in
converting them from DTM.
I'm all for naming conventions, but some names are too cryptic!
Post by Jim Self
Post by Denver Braughler
The total length of a global variable value is the block size,
minus the block header minus the variable name and subscripts.
Since a block can be a maximum of 65,024 bytes,
That sounds like a problem.
We have adapted our applications to work with 4KB blocks and a 32KB
limit on string length in local variables. The limit on the length of
strings in globals is close to what we had with DTM plus the overall
limits on the size of globals and the limits on local data in GT.M are
much more relaxed than what we had with DTM.
I still think that a programmer just shouldn't have to worry about it.
I *could* deal with using malloc(). But who really wants to??
Post by Jim Self
Applications can work with no special modifications for data values up
to the limits of local variables as long as you break the data up into
smaller chunks for storage. The biggest problem with logical data items
that exceed the limits for local variables is that they can't be passed
or returned as individual values by functions or MUMPS operators.
It sound like you agree that it is a problem.


Sure, there are workarounds.
But I have better things to do.
Post by Jim Self
Post by Denver Braughler
GT.M needs a way to store values that can span blocks.
Berkeley DB (http://sleepycat.com) has such a feature and offers
essentially unlimited string size.
I really think that is the way to go.
Post by Jim Self
Post by Denver Braughler
the maximum value of a global is somewhere around 65,000 bytes.
The value of a local variable can be up to 1MB.
I am looking forward to relaxing the 32KB string limit of previous
versions of GT.M, but so far all of our applications work comfortably
inside it - almost all of the time.
It's that 0.01% of the time that it doesn't that eats up the programmer's time.
Post by Jim Self
It used to be that we had a limit of
around 32KB for the sum of all local variables. ;)
Ah, the good old days.
What do you mean 32?
On PDP's with DSM, we had only 8K unless we supersized our job partition to 12.

7 years ago I was programming MIIS with 2K routine buffers and 2K job partitions.
That was 2K for local variables, the program pointer, stack, and intermediate
calculations all together.

The 2K limit on routines was annoying because it penalized programmers such as
myself who liked write comments.
But the job partitions probably could have been 1500 bytes for as carefully as I coded
to avoid "?STORE" and "?STACK" errors.

And a string could be a whopping 250 bytes long.

But I made do with it because I had to.
I still don't know why so many Windows applications need 100-200 MB of memory.
Jim Self
2003-11-23 07:56:33 UTC
Permalink
Post by Denver Braughler
Post by Jim Self
Post by Denver Braughler
... in GT.M, variable names can be up to 8 characters long.
If you raise that to 31 characters, you'll comply with the old standard.
Routine names and tags should be allowed to be up to 31 characters too.
I have been working with 8 character names long enough now that I don't
feel the limitation. It did force us to rename some routines in
converting them from DTM.
I'm all for naming conventions, but some names are too cryptic!
Actually, I think that GT.M does comply with the old standard. Names can
be much longer than 8 characters. It's just that they are distinguished
only by the first 8.

I have no objection as a MUMPS programmer to extending the lengths of
names, it's just that I don't see the current restrictions on names as a
significant problem, at least for local variables and labels within a
routine. Line labels only have to be unique within one small routine and
if you compose applications from small subroutines and functions, most
local variables live within an even smaller context.

Then again, limiting the names of routines to 8 characters does seem
like a problem when you have thousands of them - or tens of thousands of
them as in VistA.

Also, some MUMPS programmers still code in ALL CAPS. I would feel the 8
character limitation much more if I couldn't use mixed case names.
Post by Denver Braughler
Post by Jim Self
Post by Denver Braughler
The total length of a global variable value is the block size,
minus the block header minus the variable name and subscripts.
Since a block can be a maximum of 65,024 bytes,
That sounds like a problem.
We have adapted our applications to work with 4KB blocks and a 32KB
limit on string length in local variables. The limit on the length of
strings in globals is close to what we had with DTM plus the overall
limits on the size of globals and the limits on local data in GT.M are
much more relaxed than what we had with DTM.
I still think that a programmer just shouldn't have to worry about it.
I *could* deal with using malloc(). But who really wants to??
The point I was trying to make is that our application programmers *do
not* worry about it. Outside of images, data that does not naturally
break down into meaningful units smaller than 4KB is extremely rare
(virtually nonexistent) in my experience so far. The need for functions
to return values longer than 32KB is also rare in my experience. One of
our younger programmers bumped that limit once with some Javascript
source code that he wrote in a single file and then included within an
HTML page generated from MUMPS.
Post by Denver Braughler
Post by Jim Self
Applications can work with no special modifications for data values up
to the limits of local variables as long as you break the data up into
smaller chunks for storage. The biggest problem with logical data items
that exceed the limits for local variables is that they can't be passed
or returned as individual values by functions or MUMPS operators.
It sound like you agree that it is a problem.
Actually, I was thinking of problems that might arise with relaxing the
limits on global data where it would be possible to have larger values
in globals than you could process in local variables.

I do agree that it would be nice to have unlimited string lengths and we
will be glad to take advantage of extended limits when they are
available. I just don't want to give up performance or stability to get
them.
Post by Denver Braughler
Sure, there are workarounds.
But I have better things to do.
Post by Jim Self
Post by Denver Braughler
GT.M needs a way to store values that can span blocks.
Berkeley DB (http://sleepycat.com) has such a feature and offers
essentially unlimited string size.
I really think that is the way to go.
Post by Jim Self
Post by Denver Braughler
the maximum value of a global is somewhere around 65,000 bytes.
The value of a local variable can be up to 1MB.
I am looking forward to relaxing the 32KB string limit of previous
versions of GT.M, but so far all of our applications work comfortably
inside it - almost all of the time.
It's that 0.01% of the time that it doesn't that eats up the programmer's time.
The 32KB limit per local string is a non-problem for us so far. There
was one incident. It had an easy solution. Still, I take that as an
indication that we will find more situations in the near future where it
would be nice to be able to work with strings longer than 32KB. I think
it will be quite a while before we bump against the new limit of 1MB -
except for grins ;)
Post by Denver Braughler
Post by Jim Self
It used to be that we had a limit of
around 32KB for the sum of all local variables. ;)
Ah, the good old days.
What do you mean 32?
We moved up to 32KB+ for locals with the conversion to Datatree 14-15
years ago. I believe the total for locals in GT.M is only limited by RAM.
Post by Denver Braughler
On PDP's with DSM, we had only 8K unless we supersized our job partition to 12.
7 years ago I was programming MIIS with 2K routine buffers and 2K job partitions.
That was 2K for local variables, the program pointer, stack, and intermediate
calculations all together.
The 2K limit on routines was annoying because it penalized programmers such as
myself who liked write comments.
But the job partitions probably could have been 1500 bytes for as carefully as I coded
to avoid "?STORE" and "?STACK" errors.
And a string could be a whopping 250 bytes long.
But I made do with it because I had to.
I still don't know why so many Windows applications need 100-200 MB of memory.
Kevin O'Kane
2003-11-23 14:55:45 UTC
Permalink
Post by Jim Self
Kevin O'Kane's MUMPS-to-C project includes an implementation of MUMPS
globals on top of Berkeley DB (as one option of several). His
implementation could be used to confirm the speed difference we observed
or perhaps to show what we did wrong in MontyPERL.
If the speed difference is as significant as it appeared and not just an
artifact of something in MontyPERL, it would be good for the reputation
of MUMPS to have such a comparison confirmed and published.
Post by Denver Braughler
If a singe node in the global database could have a $L() in the header
that is five bytes long, the maximum length would be around a trillion bytes.
Obviously this cannot be achieved by merely increasing the block size.
I checked our implementation this morning regarding stored strings. In the
c++ version, there appears to be no limitation other than what can be
accepted by the Berkeley DB. In the Mumps-to-C version, there are internal
limitations based on the max string size (default 4096, set at build time).
Strings can be any length but this limit is the default declaration size
for all internal temporaries which can number 20 or more per program.

With regard to timing, I ran the following on an old Athlon-600mHz machine
with 256mb memory running Mandrake 9.1 and KDE 3.1 with 5500 RPM IDE
drives, ext3 and 64 bit file addressing enabled. The Berkeley DB cache was
set to 1 mb. Each run was very I/O dominant with cpu utilization often
in low single digits.

1. compiled mumps program:

zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!

time: 227 seconds.
Berkeley DB size: 211,655,680 bytes

2. interpreted mumps program:

zmain
x "do ^mmm.mps"

where "mmm.mps" is the first program.
time: 227 seconds
Berkele DB size: 211,665,920 bytes

2. c++ version:

#include <mumpsc/libmpscpp.h>
#include <time.h>
global a("a");
#define $rand(x) (cvt(rand()%x+1))

int main() {
char x[1001]="";
int k1,k2,k3,k4,i;
long t1;
string sk1,sk2,sk3,sk4;
for (i=0; i<1000; i++) x[i]='x';
x[1000]=0;
t1=time(NULL);
for (i=0; i<10000; i++) {
sk1=$rand(10000);
sk2=$rand(10000);
sk3=$rand(10000);
sk4=$rand(10000);
a(sk1,sk2,sk3,sk4)=x;
}
cout << time(NULL)-t1 << endl;
}

time: 228 seconds
Berkeley DB size 211,497,984 bytes

The difference in data base file sizes were due to variations in the random
number sequences which resulted in different total key lengths and occasional
duplicate references.

The timing similarities were to be expected as these runs were data base
dominant and the data base software was the same in each case. The compiled
versions significantly outperform the interpreted version on CPU dominant
jobs and the c++ class library can outperform the mumps-to-c compiled version
due to less string handling overhead, as a rule and the possibility for
faster integer and floating point calculations.

With regard to the questions concerning structured records, the c++ class
library offers the potential to build classes that manage structured data
built upon the global array manipulation "global" class. It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
address (ftell()) in a global array. E.g:

global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
patient(a,b,d,c,...) = addr; // where a,b,c... are string indices of
// the global array "patient"

In this case, there are only system limitations with regard to the
length of stored data.

With regard to perl, we have an interface to the Perl Compatible Regular
Expression Library (PCRE) which permits perl pattern matches from c/c++
(written by: Philip Hazel, University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714 and available online in source
code form).
--
Kevin C. O'Kane
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-23 15:16:47 UTC
Permalink
Post by Kevin O'Kane
I checked our implementation this morning regarding stored strings. In the
c++ version, there appears to be no limitation other than what can be
accepted by the Berkeley DB. In the Mumps-to-C version, there are internal
limitations based on the max string size (default 4096, set at build time).
Strings can be any length but this limit is the default declaration size
for all internal temporaries which can number 20 or more per program.
If there is even one, that is a problem.
If I want to store 200 Mb global nodes, do I have to have 4 GB of internal
temporary storage?
Post by Kevin O'Kane
It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
Would it be a pain if someone wanted to add a new field to Patient?
Kevin O'Kane
2003-11-23 16:47:18 UTC
Permalink
Post by Denver Braughler
Post by Kevin O'Kane
I checked our implementation this morning regarding stored strings. In the
c++ version, there appears to be no limitation other than what can be
accepted by the Berkeley DB. In the Mumps-to-C version, there are internal
limitations based on the max string size (default 4096, set at build time).
Strings can be any length but this limit is the default declaration size
for all internal temporaries which can number 20 or more per program.
If there is even one, that is a problem.
If I want to store 200 Mb global nodes, do I have to have 4 GB of internal
temporary storage?
No, these are just intermediate buffers used during string manipulation
of things like "set x=a_b_c".

The decision (which could be reversed) was made to use fixed allocations
rather than dynamic "malloc()" allocations to improve speed.

The globals are passed thru to the data base. The data base has an
adjustable size cache to improve performance as does the underlying
Linux file system. More memory would definitely improve performance

Example: I just ran the c++ version on a 2.25mHz machine with
4GB of memory. It took 8 seconds (vs 228 seconds on my 256MB
600mHz machine).

While the CPU is about 4 or 5 time faster (architecture issues here),
the difference is mainly due to the greater amount of memory. Note:
the Berkeley cache was also set at 1MB but the underlying Linux
file system cache is also running. Thus, it is likely that no
actual I/O took place during the run (the Linux cache is flushed
periodically, however, about every 10 seconds I believe).
Post by Denver Braughler
Post by Kevin O'Kane
It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
Would it be a pain if someone wanted to add a new field to Patient?
Yes, somewhat, as the file structure would have to be rewritten with all
records containing the new field. This problem, however, is common
to c/c++ and other languages that use record oriented direct access I/O
methods which depend upon a fixed record format.
--
Kevin C. O'Kane
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Jim Self
2003-11-24 01:19:30 UTC
Permalink
Post by Kevin O'Kane
The globals are passed thru to the data base. The data base has an
adjustable size cache to improve performance as does the underlying
Linux file system. More memory would definitely improve performance
Example: I just ran the c++ version on a 2.25mHz machine with
4GB of memory. It took 8 seconds (vs 228 seconds on my 256MB
600mHz machine).
What do you get when you run it repeatedly as I did in previous reply?

f test=1:1:10 d ^aTest k ^a
or
f test=1:1:5 d ^aTest,^aTest,^aTest,^aTest w ! k ^a
or
f test=1:1:5 d ^aTest,^aTest,^aTest,^aTest w ! k ^a h 60
Post by Kevin O'Kane
While the CPU is about 4 or 5 time faster (architecture issues here),
the Berkeley cache was also set at 1MB but the underlying Linux
file system cache is also running. Thus, it is likely that no
actual I/O took place during the run (the Linux cache is flushed
periodically, however, about every 10 seconds I believe).
Post by Denver Braughler
Post by Kevin O'Kane
It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
Would it be a pain if someone wanted to add a new field to Patient?
Yes, somewhat, as the file structure would have to be rewritten with all
records containing the new field. This problem, however, is common
to c/c++ and other languages that use record oriented direct access I/O
methods which depend upon a fixed record format.
This is one of those issues that makes me shy away from such languages
for building medical information systems. The ability to easily do
ongoing development, incremental enhancements, bug fixes, redefinition
of data fields, etc. without taking the system offline or having to
schedule updates for 3am seems essential for maintaining longterm
viability of such systems.
Jim Self
2003-11-24 00:52:52 UTC
Permalink
Post by Kevin O'Kane
Post by Jim Self
Kevin O'Kane's MUMPS-to-C project includes an implementation of MUMPS
globals on top of Berkeley DB (as one option of several). His
implementation could be used to confirm the speed difference we
observed or perhaps to show what we did wrong in MontyPERL.
If the speed difference is as significant as it appeared and not just
an artifact of something in MontyPERL, it would be good for the
reputation of MUMPS to have such a comparison confirmed and published.
Post by Denver Braughler
If a singe node in the global database could have a $L() in the
header that is five bytes long, the maximum length would be around a
trillion bytes.
Obviously this cannot be achieved by merely increasing the block size.
I checked our implementation this morning regarding stored strings. In the
c++ version, there appears to be no limitation other than what can be
accepted by the Berkeley DB. In the Mumps-to-C version, there are internal
limitations based on the max string size (default 4096, set at build time).
Strings can be any length but this limit is the default declaration size
for all internal temporaries which can number 20 or more per program.
With regard to timing, I ran the following on an old Athlon-600mHz machine
with 256mb memory running Mandrake 9.1 and KDE 3.1 with 5500 RPM IDE
drives, ext3 and 64 bit file addressing enabled. The Berkeley DB cache was
set to 1 mb. Each run was very I/O dominant with cpu utilization often
in low single digits.
zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!
time: 227 seconds.
Berkeley DB size: 211,655,680 bytes
I copied your code with minor modifications so I could run it repeatedly
and compare runs.

aTest ;simple timing test
;sets data in ^a
n x,i,t1
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $j($p($h,",",2)-$p(t1,",",2),5)
q

I ran it on two different computers with different versions of GT.M
One was a laptop with Pentium III 640MHz and 256MB RAM, GT.M V4.3-001B.
The 2nd a P4 2.4Ghz and 512MB RAM, GT.M V4.4-FT01

The P4 generally completed the test in 0-1 sec but occasionally took up
to 4 sec.

GTM>f test=1:1:10 d ^aTest k ^a
0 1 1 0 1 0 0 1 0 1
GTM>f test=1:1:10 d ^aTest k ^a
0 1 0 0 1 0 1 0 0 3
GTM>f test=1:1:10 d ^aTest k ^a
1 0 0 4 0 1 0 0 2 0
GTM>f test=1:1:10 d ^aTest k ^a h 10
1 0 0 1 0 1 0 1 0 0
GTM>f test=1:1:10 d ^aTest k ^a h 10
1 0 1 0 0 1 0 1 0 0

The laptop completed the test in 1-26 seconds usually 4 out of 10 at 1 sec.

When the test was repeated 4 times before k ^a, the laptop consistently
took somewhere in the neighborhood of 80 (60-100) secs to complete the
4th round.

The P4 usually completed the test in 0-1 secs but ranged up to 25 sec.

GTM>f test=1:1:5 d ^aTest,^aTest,^aTest,^aTest w ! k ^a
0 1 1 0
0 3 1 0
1 11 1 1
0 1 1 14
0 1 8 0

This suggests to me that there is, in fact, a significant speed
difference in favor of GT.M over Berkeley DB but that the timings are
complicated by cache activity. As I recall from the tests we did on
MontyPerl, the profiling tools showed that almost all of the bottleneck
was in Berkeley DB.
Post by Kevin O'Kane
zmain
x "do ^mmm.mps"
where "mmm.mps" is the first program.
time: 227 seconds
Berkele DB size: 211,665,920 bytes
#include <mumpsc/libmpscpp.h>
#include <time.h>
global a("a");
#define $rand(x) (cvt(rand()%x+1))
int main() {
char x[1001]="";
int k1,k2,k3,k4,i;
long t1;
string sk1,sk2,sk3,sk4;
for (i=0; i<1000; i++) x[i]='x';
x[1000]=0;
t1=time(NULL);
for (i=0; i<10000; i++) {
sk1=$rand(10000);
sk2=$rand(10000);
sk3=$rand(10000);
sk4=$rand(10000);
a(sk1,sk2,sk3,sk4)=x;
}
cout << time(NULL)-t1 << endl;
}
time: 228 seconds
Berkeley DB size 211,497,984 bytes
The difference in data base file sizes were due to variations in the random
number sequences which resulted in different total key lengths and occasional
duplicate references.
The timing similarities were to be expected as these runs were data base
dominant and the data base software was the same in each case. The compiled
versions significantly outperform the interpreted version on CPU dominant
jobs and the c++ class library can outperform the mumps-to-c compiled version
due to less string handling overhead, as a rule and the possibility for
faster integer and floating point calculations.
With regard to the questions concerning structured records, the c++ class
library offers the potential to build classes that manage structured data
built upon the global array manipulation "global" class. It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
patient(a,b,d,c,...) = addr; // where a,b,c... are string indices of
// the global array "patient"
In this case, there are only system limitations with regard to the
length of stored data.
With regard to perl, we have an interface to the Perl Compatible Regular
Expression Library (PCRE) which permits perl pattern matches from c/c++
(written by: Philip Hazel, University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714 and available online in source
code form).
Regular Expressions grow on you. ;)
Kevin O'Kane
2003-11-24 02:29:05 UTC
Permalink
mea culpa -

The version of the code I pasted had a loop of 10,000
whereas the actual timing for the tests reported used a
loop of 100,000 as evidenced by the size of the database
which was consistent with:

100,000 * 1,000 + 100,000 * [size of the global key]

I had modified the loop value to see if 1/10 the iterations would
yield 1/10 the time. It did not, it yielded 1/100 the time. I
forgot I changed it when I pasted in the code.

The Athlon 600mHz is probably roughly equivalent to the PIII 640 mHz.
Before each test, I deleted the old data base and environment
file (__db*) and "sync"'ed the file system (flush all cache buffers
that need to go to disk). This probably results in slightly
more consistent results.

Using a 10,000 loop, I ran a script 10 times (see below). I
did not do the "kill" in the loop but rather deleted the
data base each time and flushed the file system. I also
got erratic performance when I used the "kill" - probably
because the Berkeley btree retains key information even
though the key no longer has data. I suspect they do
garbage collection at intervals. There was substantial disk
activity which could mean many flushes of buffers for reliability
purposes. This may also be the case with GTM. Anyway, the times
could be heavily influenced by the random number generator.

As most applications are mainly inserts and retrieves
rather than deletes, I tried the following (this is a transcript
of the console session with some junk control characters
edited out. The database resides in Mumps.DB and the __db*
files are environment files used by Berkeley. These are
removed (rm) each cycle in the bash script loop. The "sysnc"
command flushes the system cache:

=======================================================================

Script started on Sun Nov 23 19:38:04 2003
[***@neamh mumpsc]# cat mmm.mps

zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!

[***@neamh mumpsc]# mumpsc mmm.mps

Compiling from Mumps source ...
The Mumps Compiler 6.22 Nov 12 2003
Translating ./mmm.mps to c:
10 lines of Mumps; 314 lines of C generated
using mpsglobal_bdb file system
Compiling generated C code...

[***@neamh mumpsc]# cat t
#!/bin/bash
for ((i=0; i<10; i++)) do
rm Mumps.DB
rm __db*
sync
echo "iteration " $i " " `mmm.cgi`
done

[***@neamh mumpsc]# t
iteration 0 2
iteration 1 2
iteration 2 2
iteration 3 2
iteration 4 2
iteration 5 3
iteration 6 3
iteration 7 3
iteration 8 3
iteration 9 2
[***@neamh mumpsc]# exit
Script done on Sun Nov 23 19:39:14 2003
======================================================================

Using our native mode global array handler (not the Berkeley),
I got all times for the above between 1 and 2 seconds using
the same script:

[***@neamh mumpsc]# t

iteration 0 2
iteration 1 1
iteration 2 2
iteration 3 1
iteration 4 2
iteration 5 1
iteration 6 2
iteration 7 1
iteration 8 2
iteration 9 1

The alternation above suggests the actual time is about 1.5 seconds
wheras the BDB appears to be about 2.75 seconds.

The native mode globals were written many years ago, originally in
Fortran, and are fast but not reliable in a system crash. They are
capable of managing a key file up to 256 terabytes and a data file
of the same size (the native globals store the data separate from
the keys). The native globals, unlike the Berkeley DB, use the
LGPL license which does not require distribution of the source
code for programs in which they are used. The BDB requires a paid
license if you want to distribute an application in binary form
(no fee if you distribute source). Someday, if time permits,
we'll work on upgrading the crash durability of the native globals.
They are, however, a good choice for applications that can be
re-run or rebuilt from journals in the event of system crash.

Other data base systems can be swapped into the package in place of
the BDB or the native globals, including an RDBMS but performance
may be an issue.
Post by Jim Self
Post by Kevin O'Kane
Post by Jim Self
Kevin O'Kane's MUMPS-to-C project includes an implementation of MUMPS
globals on top of Berkeley DB (as one option of several). His
implementation could be used to confirm the speed difference we
observed or perhaps to show what we did wrong in MontyPERL.
If the speed difference is as significant as it appeared and not just
an artifact of something in MontyPERL, it would be good for the
reputation of MUMPS to have such a comparison confirmed and published.
Post by Denver Braughler
If a singe node in the global database could have a $L() in the
header that is five bytes long, the maximum length would be around a
trillion bytes.
Obviously this cannot be achieved by merely increasing the block size.
I checked our implementation this morning regarding stored strings.
In the
c++ version, there appears to be no limitation other than what can be
accepted by the Berkeley DB. In the Mumps-to-C version, there are internal
limitations based on the max string size (default 4096, set at build time).
Strings can be any length but this limit is the default declaration size
for all internal temporaries which can number 20 or more per program.
With regard to timing, I ran the following on an old Athlon-600mHz machine
with 256mb memory running Mandrake 9.1 and KDE 3.1 with 5500 RPM IDE
drives, ext3 and 64 bit file addressing enabled. The Berkeley DB cache was
set to 1 mb. Each run was very I/O dominant with cpu utilization often
in low single digits.
zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!
time: 227 seconds.
Berkeley DB size: 211,655,680 bytes
I copied your code with minor modifications so I could run it repeatedly
and compare runs.
aTest ;simple timing test
;sets data in ^a
n x,i,t1
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $j($p($h,",",2)-$p(t1,",",2),5)
q
I ran it on two different computers with different versions of GT.M
One was a laptop with Pentium III 640MHz and 256MB RAM, GT.M V4.3-001B.
The 2nd a P4 2.4Ghz and 512MB RAM, GT.M V4.4-FT01
The P4 generally completed the test in 0-1 sec but occasionally took up
to 4 sec.
GTM>f test=1:1:10 d ^aTest k ^a
0 1 1 0 1 0 0 1 0 1
GTM>f test=1:1:10 d ^aTest k ^a
0 1 0 0 1 0 1 0 0 3
GTM>f test=1:1:10 d ^aTest k ^a
1 0 0 4 0 1 0 0 2 0
GTM>f test=1:1:10 d ^aTest k ^a h 10
1 0 0 1 0 1 0 1 0 0
GTM>f test=1:1:10 d ^aTest k ^a h 10
1 0 1 0 0 1 0 1 0 0
The laptop completed the test in 1-26 seconds usually 4 out of 10 at 1 sec.
When the test was repeated 4 times before k ^a, the laptop consistently
took somewhere in the neighborhood of 80 (60-100) secs to complete the
4th round.
The P4 usually completed the test in 0-1 secs but ranged up to 25 sec.
GTM>f test=1:1:5 d ^aTest,^aTest,^aTest,^aTest w ! k ^a
0 1 1 0
0 3 1 0
1 11 1 1
0 1 1 14
0 1 8 0
This suggests to me that there is, in fact, a significant speed
difference in favor of GT.M over Berkeley DB but that the timings are
complicated by cache activity. As I recall from the tests we did on
MontyPerl, the profiling tools showed that almost all of the bottleneck
was in Berkeley DB.
Post by Kevin O'Kane
zmain
x "do ^mmm.mps"
where "mmm.mps" is the first program.
time: 227 seconds
Berkele DB size: 211,665,920 bytes
#include <mumpsc/libmpscpp.h>
#include <time.h>
global a("a");
#define $rand(x) (cvt(rand()%x+1))
int main() {
char x[1001]="";
int k1,k2,k3,k4,i;
long t1;
string sk1,sk2,sk3,sk4;
for (i=0; i<1000; i++) x[i]='x';
x[1000]=0;
t1=time(NULL);
for (i=0; i<10000; i++) {
sk1=$rand(10000);
sk2=$rand(10000);
sk3=$rand(10000);
sk4=$rand(10000);
a(sk1,sk2,sk3,sk4)=x;
}
cout << time(NULL)-t1 << endl;
}
time: 228 seconds
Berkeley DB size 211,497,984 bytes
The difference in data base file sizes were due to variations in the random
number sequences which resulted in different total key lengths and occasional
duplicate references.
The timing similarities were to be expected as these runs were data base
dominant and the data base software was the same in each case. The compiled
versions significantly outperform the interpreted version on CPU dominant
jobs and the c++ class library can outperform the mumps-to-c compiled version
due to less string handling overhead, as a rule and the possibility for
faster integer and floating point calculations.
With regard to the questions concerning structured records, the c++ class
library offers the potential to build classes that manage structured data
built upon the global array manipulation "global" class. It is also possible
to have other direct access files open that store and retrieve data in
binary form directly from/to structures and deposit only the file offset
global patient("patient"); // makes "patient" an instance of the global class
struct Patient { ..... } p1; // complex structure with multiple data items
string addr;
...
addr=cvt(ftell(fileptr)); // get pre-write file offset & convert to string
write(p1,sizeof(struct Patient),1,fileptr); // write structure
patient(a,b,d,c,...) = addr; // where a,b,c... are string indices of
// the global array "patient"
In this case, there are only system limitations with regard to the
length of stored data.
With regard to perl, we have an interface to the Perl Compatible Regular
Expression Library (PCRE) which permits perl pattern matches from c/c++
(written by: Philip Hazel, University of Cambridge Computing Service,
Cambridge, England. Phone: +44 1223 334714 and available online in source
code form).
Regular Expressions grow on you. ;)
--
Kevin C. O'Kane, Ph.D.
Professor of Computer Science
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
(319) 266 4131 (Iowa)
(508) 778 9485 (Massachusetts)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-24 03:59:22 UTC
Permalink
Post by Kevin O'Kane
I had modified the loop value to see if 1/10 the iterations would
yield 1/10 the time. It did not, it yielded 1/100 the time.
Does this suggest that the larger one has a good bit more disk activity?
Post by Kevin O'Kane
Before each test, I deleted the old data base and environment
file (__db*) and "sync"'ed the file system (flush all cache buffers
that need to go to disk). This probably results in slightly
more consistent results.
Could you also have mmm call sync at the end of the test so that
all disk activity would be committed before the timing ended?




If your old database is nearly twice as fast as BerkeleyDB, I
think that is commendable.
I wouldn't expect that crash-proofing it would much more than double
the time assuming that you are already journaling.
Kevin O'Kane
2003-11-24 04:55:38 UTC
Permalink
Post by Denver Braughler
Post by Kevin O'Kane
I had modified the loop value to see if 1/10 the iterations would
yield 1/10 the time. It did not, it yielded 1/100 the time.
Does this suggest that the larger one has a good bit more disk activity?
Yes, vastly more disk activity. when the memory cache of disk
blocks fills, the actual I/O becomes a bottleneck. Until
then, I/O requests are satisfied from the cache. When I
ran the 100,000 loop on a faster machine with 16 times as
much memory, the time went from ~250 seconds to ~8 seconds -
mainly due to the larger memory.
Post by Denver Braughler
Post by Kevin O'Kane
Before each test, I deleted the old data base and environment
file (__db*) and "sync"'ed the file system (flush all cache buffers
that need to go to disk). This probably results in slightly
more consistent results.
Could you also have mmm call sync at the end of the test so that
all disk activity would be committed before the timing ended?
I would need to do the timing outside the program. Sounds like
something for tomorrow.
Post by Denver Braughler
If your old database is nearly twice as fast as BerkeleyDB, I
think that is commendable.
I wouldn't expect that crash-proofing it would much more than double
the time assuming that you are already journaling.
The BDB is very bullet-proof and provides many industrial strength
transaction commit/roll-back/deadlock detection features. For
a critical app, its worth the wait. Their web site
(http://www.sleepycat.com) details the many applications
in which it is embedded.
--
Kevin C. O'Kane, Ph.D.
Professor of Computer Science
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
(319) 266 4131 (Iowa)
(508) 778 9485 (Massachusetts)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Kevin O'Kane
2003-11-25 01:16:48 UTC
Permalink
Post by Kevin O'Kane
Could you also have mmm call sync at the end of the test so that all
disk activity would be committed before the timing ended?
I would need to do the timing outside the program. Sounds like
something for tomorrow.
If your old database is nearly twice as fast as BerkeleyDB, I think
that is commendable.
I wouldn't expect that crash-proofing it would much more than double
the time assuming that you are already journaling.
The script was modified to include a "sync" during each iteration. The
number of iterations was increased to 20 and the total time accumulated
and the average time calculated by dividing the total time by the number
of iterations. Note: the BDB uses files named Mumps.DB and __db* and
the native globals use 2 files named data.dat and key.dat. The "rm"
command(s) for these are alternatively commented out. Tests performed
on an Athlon 600mHz with 256mb memory, 5500 rpm ide drives under Mandrake
9.1 under Linux kernel 2.4.21. The program "avg.cgi" is a small mumps
program that accepts QUERY_STRING parameters and caculates the
average. QUERY_STRING is one of the ways a web server sends data
to a program in the cgi interface.

Results: BDB avg: 4.9 seconds; native globals avg: 1.65 seconds.

Deduct 0.05 seconds per iteration due to program load, initialization,
and script interpretation overhead.

=============================================================================
Trial 1 - Berkeley Data Base
The "rm *.dat" commented out since it is used with native globals
=============================================================================
Script started on Mon Nov 24 18:50:14 2003
[***@neamh mumpsc]# cat mmm.mps

zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!

[***@neamh mumpsc]# cat t

#!/bin/bash
echo "begin " `date`
T1=`date +"%s"`
ITER=20
rm Mumps.DB
# rm *.dat
sync
for ((i=0; i<$ITER; i++)) do
rm Mumps.DB
rm __db*
# rm *.dat
echo "iteration " $i " " `./mmm.cgi`
sync
done
T2=`date +"%s"`
T3=$((T2-T1))
echo "Iterations=" $ITER " Total time=" $T3
QUERY_STRING="i=$ITER&t=$T3"
export QUERY_STRING
./avg.cgi

[***@neamh mumpsc]# ./t

begin Mon Nov 24 18:50:25 CST 2003
rm: cannot remove `Mumps.DB': No such file or directory
rm: cannot remove `Mumps.DB': No such file or directory
rm: cannot remove `__db*': No such file or directory
iteration 0 2
iteration 1 2
iteration 2 3
iteration 3 3
iteration 4 3
iteration 5 3
iteration 6 3
iteration 7 3
iteration 8 3
iteration 9 3
iteration 10 3
iteration 11 3
iteration 12 3
iteration 13 3
iteration 14 3
iteration 15 3
iteration 16 3
iteration 17 2
iteration 18 3
iteration 19 3
Iterations= 20 Total time= 98
average=4.9

[***@neamh mumpsc]# exit
Script done on Mon Nov 24 18:52:08 2003

=============================================================================
Trail 2 - Native globals
=============================================================================
Script started on Mon Nov 24 18:46:56 2003
[***@neamh mumpsc]# cat t

#!/bin/bash
echo "begin " `date`
T1=`date +"%s"`
ITER=20
# rm Mumps.DB
rm *.dat
sync
for ((i=0; i<$ITER; i++)) do
# rm Mumps.DB
# rm __db*
rm *.dat
echo "iteration " $i " " `./mmm.cgi`
sync
done
T2=`date +"%s"`
T3=$((T2-T1))
echo "Iterations=" $ITER " Total time=" $T3
QUERY_STRING="i=$ITER&t=$T3"
export QUERY_STRING
./avg.cgi

[***@neamh mumpsc]# cat mmm.mps

zmain
set x="" for i=1:1:1000 set x=x_"x"
set t1=$h
for i=1:1:10000 do
. set k1=$r(10000)
. set k2=$r(10000)
. set k3=$r(10000)
. set k4=$r(10000)
. set ^a(k1,k2,k3,k4)=x
write $p($h,",",2)-$p(t1,",",2),!

[***@neamh mumpsc]# ./t

begin Mon Nov 24 18:47:18 CST 2003
rm: cannot remove `*.dat': No such file or directory
iteration 0 1
iteration 1 1
iteration 2 1
iteration 3 1
iteration 4 1
iteration 5 1
iteration 6 1
iteration 7 2
iteration 8 2
iteration 9 1
iteration 10 1
iteration 11 2
iteration 12 1
iteration 13 1
iteration 14 1
iteration 15 1
iteration 16 1
iteration 17 1
iteration 18 1
iteration 19 1
Iterations= 20 Total time= 33
average=1.65

[***@neamh mumpsc]# exit
Script done on Mon Nov 24 18:47:55 2003
=============================================================================
Basic program loading and initialization an script overhead is 0.05 seconds
per iteration.
--
Kevin C. O'Kane
University of Northern Iowa
Cedar Falls, IA 50614-0507
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-25 04:56:07 UTC
Permalink
Post by Kevin O'Kane
#!/bin/bash
echo "begin " `date`
T1=`date +"%s"`
ITER=20
rm Mumps.DB
# rm *.dat
sync
This was not quite what I had in mind.
The rm commands should be excluded just in case they
take appreciable time.


#!/bin/bash
echo "begin " `date`
ITER=20
T3=0
for ((i=0; i<$ITER; i++)) do
rm Mumps.DB
rm __db*
rm *.dat
sync
T1=`date +"%s"`
echo "iteration " $i " " `./mmm.cgi`
sync
T2=`date +"%s"`
T3=$((T2-T1+T3))
done
echo "Iterations=" $ITER " Total time=" $T3
QUERY_STRING="i=$ITER&t=$T3"
export QUERY_STRING
./avg.cgi

The timing precision won't be very good here.
Is there no way to get milliseconds?



How about this?:

#!/bin/bash
rm __db*
rm Mumps.DB*
rm *.dat
echo "begin " `date`
T1=`date +"%s"`
ITER=20
sync
for ((i=0; i<$ITER; i++)) do
echo "iteration " $i " " `./mmm.cgi`
sync
## I hope you have a big disk ##
# mv Mumps.DB Mumps.DB.$i
# mv __db* __db-$i # if there is more than one you'll have to fix this up or rm anyway
mv key.dat key.$i.dat
mv data.dat data.$i.dat
done
T2=`date +"%s"`
T3=$((T2-T1))
echo "Iterations=" $ITER " Total time=" $T3
QUERY_STRING="i=$ITER&t=$T3"
export QUERY_STRING
./avg.cgi

T4=`date +"%s"`
rm __db*
rm Mumps.DB*
rm *.dat
sync
T5=`date +"%s"`
echo "Deleting took " $((T5-T4)) " s."
Kevin O'Kane
2003-11-25 14:43:57 UTC
Permalink
The "rm" command for the BDB database file including
a subsequent "sync" costs 0.08 seconds per iteration.
Post by Denver Braughler
This was not quite what I had in mind.
The rm commands should be excluded just in case they
take appreciable time.
--
Kevin C. O'Kane
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-25 15:49:11 UTC
Permalink
Post by Kevin O'Kane
The "rm" command for the BDB database file including
a subsequent "sync" costs 0.08 seconds per iteration.
Does that sound right to you?
80 ms sounds like about four disk chunks.
Perhaps that's enough to move the inode-lists to the free-inodes list.
May I assume that you are using unix blocks that are a lot bigger than 512 b?

I still like the idea of using just mv between iterations.
If there is some major variation caused by file system performance, it
will appear as a greater variance in individual run times.
Kevin O'Kane
2003-11-25 16:20:14 UTC
Permalink
Its difficult to say. I'm using ext3. The inodes are
probably resident and the operation may very well be
only one or 2 writes. I'm not sure how the low
level file system operates. Example times:

iteration 0 3
1069776882 483328000
1069776882 533490000
iteration 1 3
1069776887 323621000
1069776887 376009000
iteration 2 3
1069776892 373104000
1069776892 424094000

where the script relevant part of the script was:

date +"%s %N"
rm Mumps.DB
rm __db*
sync
date +"%s %N"

The first number in the timing is the Linux seconds
(since 1970) value and the second is the nanoseconds
result. In the run above, which I just re-did,
the values are smaller than my earlier run.
Post by Denver Braughler
Post by Kevin O'Kane
The "rm" command for the BDB database file including
a subsequent "sync" costs 0.08 seconds per iteration.
Does that sound right to you?
80 ms sounds like about four disk chunks.
Perhaps that's enough to move the inode-lists to the free-inodes list.
May I assume that you are using unix blocks that are a lot bigger than 512 b?
I still like the idea of using just mv between iterations.
If there is some major variation caused by file system performance, it
will appear as a greater variance in individual run times.
--
Kevin C. O'Kane, Ph.D.
Professor of Computer Science
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
(319) 266 4131 (Iowa)
(508) 778 9485 (Massachusetts)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-25 18:01:40 UTC
Permalink
Post by Kevin O'Kane
iteration 0 3
1069776882 483328000
1069776882 533490000
0.05 s
Post by Kevin O'Kane
In the run above, which I just re-did,
the values are smaller than my earlier run [0.08 seconds].
Okay, I'll accept that it is probably always
inconsequential to an overall iteration that
takes several seconds.

I meant to ask whether you support setlefts, like
set $p(x,"x",10)="x"


I think your test node size is way above average for normal data
which may be giving an advantage to the Cat.

set x="" for i=1:1:10 set x=x_"x" ;smaller node size
set t1=$h
for i=1:1:1E7 do ;more nodes
. set k1=$r(10000)
. set k2=$r(100000) ;larger key space to reduce collisions
. set k3=$r(100000)
. set k4=$r(100000)
. set ^a(k1,k2,k3,k4)=x
Kevin O'Kane
2003-11-25 21:35:08 UTC
Permalink
Post by Kevin O'Kane
Post by Kevin O'Kane
iteration 0 3
1069776882 483328000
1069776882 533490000
0.05 s
Post by Kevin O'Kane
In the run above, which I just re-did,
the values are smaller than my earlier run [0.08 seconds].
Okay, I'll accept that it is probably always
inconsequential to an overall iteration that
takes several seconds.
I meant to ask whether you support setlefts, like
set $p(x,"x",10)="x"
yes although there was a recent fix to this section
of code.
Post by Kevin O'Kane
I think your test node size is way above average for normal data
which may be giving an advantage to the Cat.
set x="" for i=1:1:10 set x=x_"x" ;smaller node size
set t1=$h
for i=1:1:1E7 do ;more nodes
. set k1=$r(10000)
. set k2=$r(100000) ;larger key space to reduce collisions
. set k3=$r(100000)
. set k4=$r(100000)
. set ^a(k1,k2,k3,k4)=x
I changed each of the $r() args to 100,000 and got essentially
the same results: 4.5 sec/iteration for the BDB and 1.8 for
the native globals.

Then, I changed the length of "x" to 10 from 1000 and got
1.1 seconds with the native globals but a surprizing 1.4
seconds with the BDB. This would seem to indicate that the
BDB is not adept at managing longer stored data items.
It should also be said, both these methods have
many tuning parameters that affect performance. The BDB
settings here, however, are the defaults.
--
Kevin C. O'Kane, Ph.D.
Professor of Computer Science
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
(319) 266 4131 (Iowa)
(508) 778 9485 (Massachusetts)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-26 02:30:01 UTC
Permalink
Post by Kevin O'Kane
set $P(x,"x",10)="x" ;smaller node size
set t1=$h
for i=1:1:1E7 do ;more nodes
. set k1=$r(10000)
. set k2=$r(100000) ;larger key space to reduce collisions
. set k3=$r(100000)
. set k4=$r(100000)
. set ^a(k1,k2,k3,k4)=x
I changed each of the $r() args to 100,000 and got essentially
the same results: 4.5 sec/iteration for the BDB and 1.8 for
the native globals.
Then, I changed the length of "x" to 10 from 1000 and got
1.1 seconds with the native globals but a surprizing 1.4
seconds with the BDB. This would seem to indicate that the
BDB is not adept at managing longer stored data items.
I don't understand why the times decreased with the longer x
for both databases.

Did you keep the number of global sets at 1E7?
Kevin O'Kane
2003-11-26 05:00:15 UTC
Permalink
Post by Denver Braughler
Post by Kevin O'Kane
set $P(x,"x",10)="x" ;smaller node size
set t1=$h
for i=1:1:1E7 do ;more nodes
. set k1=$r(10000)
. set k2=$r(100000) ;larger key space to reduce collisions
. set k3=$r(100000)
. set k4=$r(100000)
. set ^a(k1,k2,k3,k4)=x
I changed each of the $r() args to 100,000 and got essentially
the same results: 4.5 sec/iteration for the BDB and 1.8 for
the native globals.
Then, I changed the length of "x" to 10 from 1000 and got
1.1 seconds with the native globals but a surprizing 1.4
seconds with the BDB. This would seem to indicate that the
BDB is not adept at managing longer stored data items.
I don't understand why the times decreased with the longer x
for both databases.
Did you keep the number of global sets at 1E7?
no, the times decreased with the shorter "x". the size of the
decrease in the BDB was surprizing while the modest decrease
in the native globals was about as expected.
the for loop limit was set to 10,000.
--
Kevin C. O'Kane,
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
http://www.cs.uni.edu/~okane
***@cs.uni.edu
Denver Braughler
2003-11-26 23:27:04 UTC
Permalink
Post by Kevin O'Kane
no, the times decreased with the shorter "x". the size of the
decrease in the BDB was surprizing while the modest decrease
in the native globals was about as expected.
the for loop limit was set to 10,000.
I was hoping for 1E6 or 1E7 nodes.
When you have nodes that are 10 rather 1000 bytes,
it makes sense to increase the iterations by 100.
But the times were so short anyway, that increasing to 1E7
decrease the significance of administrative overhead.

Maury Pepper
2003-11-05 16:13:37 UTC
Permalink
SSVN's and $INCREMENT. The former gets GT.M at or near the '95 standard.
The latter is a great improvement and will make it easier to keep VistA
working on GT.M.
Post by bhaskar
Having released some significant enhancements to GT.M in the last few
months in V4.4-002 and V4.4-003, we are in the process of prioritizing
the next round of GT.M development (enhancements, misfeature
corrections and bug fixes). This is an opportunity for the community
to tell us what you would like to see in GT.M in the future. What
would you like to have in GT.M that you don't have today? You can
either post a response here on comp.lang.mumps, or you can e-mail me
privately at k dot bhaskar at sanchez dot com (I will personally
acknowledge every e-mail on the topic, so in case you don't hear from
me, it means that your e-mail didn't get through, possibly trapped by
some over-zealous spam filter).
If you prefer to phrase your response in terms of what you like and
don't like about GT.M, that too would be valuable.
Thanx in advance for your comments. They will help us make GT.M your
favorite M implementation.
-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by bhaskar
GT.M V4.4-003 has been released.
[KSB] <...snip...>
bhaskar
2003-11-06 14:10:36 UTC
Permalink
Thank you, Maury.

-- Bhaskar
Post by Maury Pepper
SSVN's and $INCREMENT. The former gets GT.M at or near the '95 standard.
The latter is a great improvement and will make it easier to keep VistA
working on GT.M.
Volodymyr Ilnytskyy
2003-11-07 06:41:01 UTC
Permalink
ZWINTERM mnemonics as it released in MSM.
Post by bhaskar
Having released some significant enhancements to GT.M in the last few
months in V4.4-002 and V4.4-003, we are in the process of prioritizing
the next round of GT.M development (enhancements, misfeature
corrections and bug fixes). This is an opportunity for the community
to tell us what you would like to see in GT.M in the future. What
would you like to have in GT.M that you don't have today? You can
either post a response here on comp.lang.mumps, or you can e-mail me
privately at k dot bhaskar at sanchez dot com (I will personally
acknowledge every e-mail on the topic, so in case you don't hear from
me, it means that your e-mail didn't get through, possibly trapped by
some over-zealous spam filter).
If you prefer to phrase your response in terms of what you like and
don't like about GT.M, that too would be valuable.
Thanx in advance for your comments. They will help us make GT.M your
favorite M implementation.
-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by bhaskar
GT.M V4.4-003 has been released.
[KSB] <...snip...>
bhaskar
2003-11-07 14:21:13 UTC
Permalink
Volodymyr --

Thank you for the suggestion. I am not familiar with ZWINTERM. What does it do?

Regards
-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by Volodymyr Ilnytskyy
ZWINTERM mnemonics as it released in MSM.
Volodymyr Ilnytskyy
2003-11-12 09:01:07 UTC
Permalink
Mnemonic Namespaces
Overview
MSM's mnemonic namespaces feature allows users to develop M
applications that
are relatively device-independent in handling low-level device
functions (for
example: clearing a terminal screen or backspacing one block on a
tape). Mnemonic
namespaces can be used to access many of the device types supported by
MSM.
The device types that can be supported through namespaces include:
terminal
devices, sequential block processor (SBP) devices, host file server
(HFS) devices,
interjob communication (IJC) devices, magnetic tape devices, the MSM
spool device,
and the host system spool device. For additional information about
mnemonic
namespaces, refer to &#8220;MSM Commands&#8221; in this manual, and
&#8220;Using Peripheral
Devices&#8221; in the MSM-Workstation Technical Reference
documentation.
Two types of mnemonic namespaces are supported in MSM: built-in
namespaces and
user-defined namespaces. MSM's standard distribution includes two
built-in
namespaces and one user-defined namespace. The X3.64-1979 built-in
namespace is
commonly referred to as the ANSI terminal namespace. The ZWINTERM
built-in
namespace allows windowing capabilities on dumb terminals.
The user-defined namespace is called X3.64 TEMPLATE and is distributed
in a
DOS file called ANSI.NAM. The system manager can use this namespace as
a
template to create other namespaces. The X3.64 TEMPLATE namespace can
be
imported into the MSM database using the Import a Namespace option.
The template
then can be copied to a new name and edited. It includes the complete
set of ANSI
mnemonics.
Because ANSI terminals are the most frequently used terminal type in M
applications, the namespace for this terminal type was directly built
into the system.
From a technical point of view, a built-in namespace is coded directly
into the MSM
system monitor rather than through user-supplied M code. As a result,
overhead when
using this namespace is significantly reduced.
Because the X3.64-1979 is built-in, it cannot be edited or deleted by
the user. The
mnemonics defined within the namespace cannot be listed through this
option of the
SYSGEN utility.

-----------------------

ZWINTERM Namespace
The ZWINTERM mnemonic namespace provides a mechanism for programmers
to
perform windowing functions on dumb terminals. These functions also
are available
on the console device of PC systems operating under MSM-PC/PLUS.
Internally,
when the user specifies the ZWINTERM mnemonic namespace through the
USE
command, the MSM system builds a copy of the user's terminal screen in
memory.
This includes both characters on the screen and attributes associated
with each
character. Subsequent updates to the screen are applied to the memory
image, as well
as to the physical screen.
When a window is opened in the ZWINTERM mnemonic namespace, the system
makes a copy from the in-memory screen image of the area that will be
overlaid by
the new window. After the window is opened, the program can use the
normal READ
and WRITE commands to access the window. When the window is closed,
the
system automatically refreshes the area under the window. No
additional action is
required on the part of the program to restore the area.
The ZWINTERM mnemonic namespace also supports the built-in X3.64-1979
namespace for interpreting format controls within the window. The
X3.64-1979
namespaces must be initialized using the /INIT format control after
the ZWINTERM
namespace has been invoked. User-defined namespaces are not supported
with
ZWINTERM.
Attributes
The ZWINTERM mnemonic namespace supports attributes for setting text
attributes
(bold, blink, underline, reverse) and colors (background and
foreground) within a
window, within a border, and within a title. The following table
describes the
supported attributes.

-----------------------

For more detail information see
ftp://ftp.intersys.com/pub/msm/docs/msm44/ref.pdf

Ilnytskyy
Post by bhaskar
Volodymyr --
Thank you for the suggestion. I am not familiar with ZWINTERM. What does it do?
Regards
-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by Volodymyr Ilnytskyy
ZWINTERM mnemonics as it released in MSM.
bhaskar
2003-11-12 23:31:51 UTC
Permalink
Volodymyr --

Thank you for the suggestions.

Regards
-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Post by Volodymyr Ilnytskyy
Mnemonic Namespaces
[KSB] <...snip...>
Jim Self
2003-11-23 10:27:38 UTC
Permalink
Post by bhaskar
Having released some significant enhancements to GT.M in the last few
months in V4.4-002 and V4.4-003, we are in the process of prioritizing
the next round of GT.M development (enhancements, misfeature
corrections and bug fixes). This is an opportunity for the community
to tell us what you would like to see in GT.M in the future. What
would you like to have in GT.M that you don't have today? You can
either post a response here on comp.lang.mumps, or you can e-mail me
privately at k dot bhaskar at sanchez dot com (I will personally
acknowledge every e-mail on the topic, so in case you don't hear from
me, it means that your e-mail didn't get through, possibly trapped by
some over-zealous spam filter).
If you prefer to phrase your response in terms of what you like and
don't like about GT.M, that too would be valuable.
Thanx in advance for your comments. They will help us make GT.M your
favorite M implementation.
-- Bhaskar
The most important things to me in a MUMPS implementation are
performance, scalability, reliability, and availability - areas where it
seems to me that GT.M is already excellent. Also, GT.M has powerful
capabilities that we have not yet begun to explore. I would like to see
some examples of how to use them. The possibility of improving
performance and sophistication of applications with library files that
tie into utilities outside of MUMPS seems very promising. So also does
the new capability to embed GT.M inside of applications written in C or
other languages.

In converting VMACS to GT.M we have learned to do without some features
that we thought were very important when we worked with DTM.

User-defined mnemonic namespaces (mentioned by Volodymyr Ilnytskyy) are
one of the features we missed the most in converting from DTM to GT.M.
If we still cared about supporting dumb terminals this would be
extremely important to us. It is much less important to us now that our
efforts are focused on an entirely web based user interface and we have
already converted those applications that we intend to bring forward.
Still it provides an elegant solution for controlling a variety of
terminal and printer types with a high level of cross-device
functionality that could be extremely to helpful people attempting to
convert existing applications to GT.M from other MUMPS.

The other feature that we missed the most in converting from DTM - that
we still miss - is the angle-bracket syntax for getting and setting
multiple variables at one step. When working with data records
containing many fields in one string, it made the MUMPS code much more
succinct and readable - and it made generated code easier to work with.

For instance:
set $zp="|",<name,addr,city,state,zip,email,phoneHome,...>=record
is equivalent to
set name=$p(record,"|",1)
set addr=$p(record,"|",2)
set city=$p(record,"|",3)
set state=$p(record,"|",4)
set zip=$p(record,"|",5)
set email=$p(record,"|",6)
set phoneHome=$p(record,"|",7)
...

and the reverse
set $zp="|",record=<name,addr,city,state,zip,email,phoneHome,...>
is equivalent to
set $p(record,"|",1)=name
set $p(record,"|",2)=addr
set $p(record,"|",3)=city
set $p(record,"|",4)=state
set $p(record,"|",5)=zip
set $p(record,"|",6)=email
set $p(record,"|",7)=phoneHome
...

and clearly, the effect is greater as the number of fields increases.

There was an MDC proposal with similar functionality in $RECORD and I
think that Cache' also has similar functionality in $LIST. I think I
recall that $RECORD made its way to MDC Type A status but was never
implemented.

I think the thing I would like most to have in GT.M that we don't have
now is an object oriented syntax and mechanism, especially for dealing
with data objects. I have been thinking for some time now that it would
be nice if we simply had the option of treating certain subscripts as
object properties, similar to the way they are treated in Javascript.

That is, a local variable 'Patient' might be thought of as an object
with a property 'Name' or 'Age' or 'Guarantor' that might be referred to
as Patient("Name") and Patient("Age") and Patient("Guarantor") or
alternatively as Patient.Name and Patient.Age and Patient.Guarantor.

By itself that gives you a simpler syntax that is easier to read and to
write, but borrowing another idea in Javascript, prototypes, Patient
could have a special property from which additional properties could be
derived, such as Patient.Guarantor.Address or
Patient.LastVisit.AdmitDate.Year
bhaskar
2003-11-24 15:13:35 UTC
Permalink
Jim --

Thank you for your input.

-- Bhaskar
k dot bhaskar at sanchez dot com <-- send e-mail here
Loading...