Marklogic Optic API

I’ve been testing migrating one of our systems to Marklogic 9 and using the Optics API.

One of our functions involves grouping claims by member_id, member_name and getting the sums and counts, so I did something like this:

var results = op.fromView('test', 'claims')
  .groupBy(['member_id', 'member_name'], [
         op.count('num_claims', 'claim_no'),
         op.sum('total_amount', 'claim_amount')
         ])
  .orderBy(op.desc('total_amount'))
  .limit(200)
  .result()
  .toArray();

Above works fine. The results are of the form

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500
  }, 
  ...
]

However, we also have a field “company”, where each claim is filed under a different company. Basically the relevant view columns are claim_no, member_id, member_name, company, claim_amount

I would like to be able to show a column that list the different companies for which the member_id/member_name has filed claims, and how many claims for each company.

i.e. I want my results to be something like:

[
  { 
    member_id: 1, 
    member_name: 'Bob', 
    num_claims: 10, 
    total_amount: 500,
    companies: [
      {
        company: 'Ajax Co',
        num_claims: 8
      },
      {
        company: 'Side Gig',
        num_claims: 2
      }
    ]
  }, 
  ...
]

I tried something like this:

results = results.map((member, index, array) => {
  var companies = op.fromView('test', 'claims')
    .where(op.eq(op.col('member_id'), member.member_id))
    .groupBy('company', [
      op.count('num_claims', 'claim_no')      
    ])
    .result()
    .toArray();
  member.companies = companies;
  return member;
});

And the output seems correct, but it also executes quite slowly - almost a minute (total number of claim documents is around 120k)

In our previous ML8 implementation, we were pre-generating summary documents for each member - so retrieval was reasonably fast with the downside that whenever we got a bunch of new data, all of the summary documents had to be re-generated. I was hoping that ML9’s optic API would make it easier to do the retrieval/grouping/aggregates on the fly so we wouldn’t have to do that.

In theory, I could just add company to the groupBy fields, then merge the rows in the result query as needed. But the problem with that approach is that I can’t guarantee I’ll get the top 200 by total amount (as was my original query)

So, the question is: Is there a better way of doing this with a reasonable execution time? Or should I just stick to pre-generating the summary documents?

Posted by Roy Tang under notes at 16 Nov 2017 6:15pm #marklogic #marklogic-9 #questions #stackoverflow #software development
Also on: stackexchange / 2

Comments

Sam Mefford

2017-11-16 18:24:12

Could you share some sample data to assist with testing?

ehennum

2017-11-16 19:16:15

If I’m understanding correctly, you should be able to implement that with a single Optic query that groups twice.

The first group should aggregate to the company level
The second group should aggregate to the member level, collecting the detail with the array aggregate

The query would probably look something like the following:

const results =
  op.fromView('test', 'claims')
    .groupBy(['member_id', 'company'], [
        'member_name',
        op.count('company_claims', 'claim_no'),
        op.sum('company_amount', 'claim_amount')
        ])
    .select(['member_id',
        'member_name',
        'company_claims',
        'company_amount',
        op.as('company_desc', op.jsonObject([
                op.prop('company',    op.col('company')),
                op.prop('num_claims', op.col('company_claims'))
                ]))
        ])
    .groupBy(['member_id'], [
        'member_name',
        op.sum('num_claims',   'company_claims'),
        op.sum('total_amount', 'company_amount'),
        op.arrayAggregate('companies', 'company_desc')
        ])
    .orderBy(op.desc('total_amount'))
    .limit(200)
    .result()
    .toArray();

By the way, if you specify a column in the aggregates list, it is sampled. Where the column has the same value for the entire group (which I presume is the case with “member_name”), you can sample it instead of specifying it as an additional grouping key.

Also, in modern JavaScript var is usually avoided in favor of const or let.

Hoping that helps,

If you enjoyed my content for some reason, I'd love to hear from you! Here are some options:

You can buy me a coffee!
You can write a reply on your own site and submit the URL as a webmention via the form below.
Or you can just contact me!