Roy Tang

Programmer, engineer, scientist, critic, gamer, dreamer, and kid-at-heart.

Blog Notes Photos Links Archives About

Something similar to my question here: https://stackoverflow.com/questions/40715822/marklogic-node-js-api-group-by-and-sort-by-count

I have documents in Marklogic with fields name and amount. I want to get the total amount for each name. Basically in SQL it would be

select name, sum(amount) from table group by name

I have range indexes for both name and amount. For getting sum aggregates, the documentation suggests something like valuesBuilder.fromIndexes(‘amount’).aggregates(‘sum’), but this only gets the sum for all records, instead of per name like I want.

Any advice?

Comments

can you show a sample document? I assume that you have the same name appear in different documents with different URIs - is that correct?

I don’t think there’s an out-of-the-box group by, but you can build an extension to do it. I made some sample docs like this:

declareUpdate();

for (let index=0; index < 1000; index++) {
  xdmp.documentInsert(
    '/values' + index + '.json',
    {
      "name": "val" + (xdmp.random(2) + 1),
      "amount": xdmp.random(2) + 1
    }
  )
}

I ran this code in Query Console:

'use strict';

var jsearch = require('/MarkLogic/jsearch.sjs');
var tuples = 
  jsearch.tuples(
    [
      cts.jsonPropertyReference('name'), 
      cts.jsonPropertyReference('amount', ["type=int"])
    ]
  )
    .slice(0, 1000) // get all the values
    .map({frequency: "item", names: ['name', 'count', 'frequency']})
    .result();

tuples
  .reduce(function(acc, current, index, array) {
    acc[current.name] = parseInt(current.count, 10) * current.frequency;
    return acc;
  }, {});

That got me a response of:

{
  "val1": 381, 
  "val2": 351, 
  "val3": 324
}

I think that’s what you’re looking for. Other stuff you’ll need:

As an alternative to Dave’s suggestion you can do something similar in Node.js but again, the work/code would have to be developed by you.

Option 1)

db.values.read(
  vb.fromIndexes('name', 'amount')
)
.result()

this code here creates a value co-occurance of names and amounts - you can then go and process this information.

Option 2)

Get the names from the indexes, then feed that information to another function that would get the amounts and sum them up:

function getNames() {
  return db.values.read(
    vb.fromIndexes('name')
  )
  .result();
}

function getSum(names) {
  const promises = names.map(name => {
     return db.values.read(
      vb.fromIndexes('amount')
      .aggregates('sum')
      .where(vb.parsedFrom(`name:${name}`,
             vb.parseBindings(vb.value('name', vb.bind('name')))
    ))).result();
  });
  return Promise.all(promises);
};

getNames()
.then(response => {
  const names = response['values-response'].tuple.map(name => name['distinct-value'].toString());
  console.log(names); //this will hold all the names
  return getSum(names);
})
.then(response => {
  const sums = response.map(r => {
    return r['values-response']['aggregate-result'][0];
  }).map(s => s._value);
  console.log(sums); //this will hold all the sums
})
.catch(error => console.log(error));

Option 3)

What Dave suggested - create the SJS extension, and call that from Node.js

One footnote to the good suggestions from Tamas and Dave – in MarkLogic 9, you can use the Optic API in ways similar to the following example (in this case, using Server-Side JavaScript):

'use strict';
const op = require('/MarkLogic/optic');

const employees = op.fromView(null, 'employees');
const expenses  = op.fromView(null, 'expenses');
employees
   .joinInner(expenses, op.on(
       employees.col('EmployeeID'), expenses.col('EmployeeID')
       ))
   .groupBy('LastName', op.sum('Total', 'Amount'))
   .orderBy('Total')
   .where(op.lt(199, op.col('Total')))
   .result();