All entries tagged software-development.
You can subscribe to an RSS feed of this list.
Jan 2019
-
This year I had the dubious privilege of having to work with a C++ project again. Although my college education was in C, that was a completely different animal. I did self-study C++ for a bit back even before I was working, mostly because I was interested in game development even back then. I remember trying some OpenGL and/or DirectX stuff back with good old Borland Turbo-C++ during the DOS days and using the Dev-C++ IDE when I shifted to Windows.
Dec 2018
-
For any nontrivial software project of at least moderate team size, there can be a significant cost to onboarding a new team member, especially at later stages when you are rushing to meet deadlines. The most signifiant cost is of course the communication overhead as described in the Mythical Man Month. Fun story, the CEO of a company once told me they would add more developers to a delayed project to meet the deadline and when I pointed out the increased overhead he said to me that it wasn’t a problem because they would just assign modules to those devs that have minimal dependencies so they don’t have to communicate so much.
Nov 2018
-
Text editors (and by extension IDEs) are a programmer’s best friend. I thought I’d look back at a number of text editors I’ve used over the years. (I grew up with Windows, so I won’t list vim/emacs/nano here, even though I’m at least a bit proficient with vim by now. That is, I know how to exit vim.) Notepad – of course, the default editor in Windows. The one we turn to when all else fails.
-
According to Malcolm Gladwell’s book Outliers, you need 10,000 hours of continuous sustained practice to become an expert. There are 168 hours in a week. If you never sleep and you eat as you practice, you can become an expert in 60 weeks. (Around 14 months) If you sleep 8 hours a day, you only have 112 hours in a week. If you eat as you practice, you can become an expert in 90 weeks.
-
SCM (Software Configuration Management) doesn’t just refer to version control for the software you’re building. It also means controlling the versions of software you depend on. This includes operating system and programming runtimes. Sometimes even minor version differences can cause issues in running your software. I have two example stories to share: One of our clients asked us for help with an upgrade their production servers from CentOS 6.4 to 6.
Oct 2018
-
Systemic change is difficult. I’m talking about software projects/systems, but there are a lot of parallels with societal systems too, like governments or states. I’ve been in large projects with hundreds of thousands of LOC where a lot of the code was painful to read and full of code smells and so on. It happens over time as projects get maintained by different developers and teams or different enhancements or changes are made.
-
Ten years ago this month, I started studying Django by trying to build my own blog application. I found the code lying around while I was going through some backups lately. It’s way out of date, it uses an early version of django. I thought of bringing it up to speed, but that didn’t seem practical. Instead, for archival purposes, I cleaned it up a bit and uploaded the code to a github repo.
-
Malcolm Gladwell, in an article from 1996 discussing the Challenger disaster, tells us: This kind of disaster is what the Yale University sociologist Charles Perrow has famously called a “normal accident.” By “normal” Perrow does not mean that it is frequent; he means that it is the kind of accident one can expect in the normal functioning of a technologically complex operation. Modern systems, Perrow argues, are made up of thousands of parts, all of which interrelate in ways that are impossible to anticipate.
-
Rockstar was in the gaming news recently because they mentioned that some of them had worked 100-hour weeks on their massive sequel to Red Dead Redemption coming out soon (no idea if I’ll play this). The idea of 100 hour weeks seemed insane to me, and it got me thinking: I’ve done some serious overtime before, have I ever gotten close to that amount of work in a week? Luckily, I didn’t have to speculate too much, because I had data (I love data).
-
Mentoring is one of those tasks that’s to be expected of anyone in a senior software development role. This usually involves reviewing other people’s code, helping them with tough technical issues, and even giving career advice. I’m not sure how good I am when it comes to mentoring other software developers. When I first became technical lead on projects, I got some evaluations from junior developers saying I can be “intimidating”.
Apr 2018
-
As a programmer, I’ve always been a big fan of StackOverflow. I asked my first question there and also wrote my first answer in September 2008, which was the month the site launched, so I was pretty much there from the beginning. The site was a huge boon to programmers when it first came out, because the internet as a venue for asking questions and answers back then was a horrible fragmented landscape of small forums and mailing lists and sites like Experts Exchange, all of which were terribly designed.
Feb 2018
-
According to the documentation here: https://djangobook.com/syndication-feed-framework/
If link doesn’t return the domain, the syndication framework will insert the domain of the current site, according to your SITE_ID setting
However, I’m trying to generate a feed of magnet: links. The framework doesn’t recognize this and attempts to append the SITE_ID, such that the links end up like this (on localhost):
<link>http://localhost:8000magnet:?xt=...</link>
Is there a way to bypass this?
Dec 2017
-
I have a query using MarkLogic node.js that basically boils down to something like this:
db.documents.query(qb.where(qb.collection('test'))).stream() .on('data', function(row) { console.log("Stream on data"); }) .on('end', function() { console.log("Stream on end"); }) .on('error', function(error) { console.log(error); }) ;
Now, for a certain collection we have in our database, the ‘end’ function doesn’t fire, i.e. I never see “Stream on end” appear in the log. There’s no error or anything, processing just stops. It’s only for this particular collection, other collections seem fine.
If I query documents in that collection directly using other methods such as qb.value() without using qb.collection(), the end event fires correctly. But once I add qb.collection() into the mix (using qb.and), the end event doesn’t fire.
I’m unsure how to debug this, as this is my first time trying to use streams in the nodejs client library. Any advice as to what I can check?
Thanks!
-
I was thinking about my typical approach to coding. When writing a new feature, I tend to implement in the direction of where the data flows, starting from the user interface then to the backend and back to the frontend and wherever else that goes. I will build incrementally, using debugging tools or console printouts to ensure that each step is working correctly. As an example, here’s how I did a recent web-based function:
-
How to specify that a column in the schema should be nullable?
I tried adding a nullable attribute:
var myFirstTDE = xdmp.toJSON( { "template": { "context": "/match", "collections": ["source1"], "rows": [ { "schemaName": "soccer", "viewName": "matches", "columns": [ { "name": "id", "scalarType": "long", "val": "id", "nullable": 0 }, { "name": "document", "scalarType": "string", "val": "docUri" }, { "name": "date", "scalarType": "date", "val": "match-date" }, { "name": "league", "scalarType": "string", "val": "league" } ] } ] } } ); tde.validate( [myFirstTDE] );
But this gave me a template error:
"message": "TDE-INVALIDTEMPLATENODE: Invalid extraction template node: fn:doc('')/template/array-node('rows')/object-node()/array-node('columns')/object-node()[1]/number-node('nullable')"
For a template defined using XQuery, adding nullable to the column works:
<column> <name>ISSN</name> <scalar-type>string</scalar-type> <val>Journal/ISSN</val> <nullable>true</nullable> </column>
How to do the same thing using JS/Json?
Nov 2017
-
This is a follow-up to my question here: https://stackoverflow.com/questions/47449002/marklogic-template-driven-extraction-and-triples-dealing-with-array-nodes/47459250#47459250
So let’s say I have a number of documents structured like this:
declareUpdate(); xdmp.documentInsert( '/test/tde.json', { content: { name:'Joe Parent', children: [ { name: 'Bob Child' }, { name: 'Sue Child' }, { name: 'Guy Child' } ] } }, {permissions : xdmp.defaultPermissions(), collections : ['test']})
I want to define a template that would extract triples from these documents defining sibling relationships between the children. For the above example, I would want to extract the following triples (the relationship is two-way):
Bob Child sibling-of Sue Child Bob Child sibling-of Guy Child Sue Child sibling-of Bob Child Sue Child sibling-of Guy Child Guy Child sibling-of Bob Child Guy Child sibling-of Sue Child
How can i set up my template to accomplish this?
Thanks!
-
I’ve been studying the examples here: https://docs.marklogic.com/guide/semantics/tde#id_25531
I have a set of documents that are structured with a parent name and an array of children nodes with their own names. I want to create a template that generates triples of the form “name1 is-a-parent-of name2”. Here’s a test I tried, with a sample of the document structure:
declareUpdate(); xdmp.documentInsert( '/test/tde.json', { content: { name:'Joe Parent', children: [ { name: 'Bob Child' }, { name: 'Sue Child' } ] } }, {permissions : xdmp.defaultPermissions(), collections : ['test']}) cts.doc('/test/tde.json') var tde = require("/MarkLogic/tde.xqy"); // Load the user template for user profile rows var template = xdmp.toJSON( { "template":{ "context":"content", "collections": [ "test" ], "triples":[ { "subject": { "val": "xs:string(name)" }, "predicate": { "val": "sem:iri('is-parent-of')" }, "object": { "val": "xs:string(children/name)" } } ] } } ); //tde.validate([template]), tde.templateInsert("/templates/test.tde", template); tde.nodeDataExtract( [cts.doc( '/test/tde.json' )] )
However, the above throws an Exception:
[javascript] TDE-EVALFAILED: tde.nodeDataExtract([cts.doc("/test/tde.json”)]) – Eval for Object='xs:string(children/name)’ returns TDE-BADVALEXPRESSION: Invalid val expression: XDMP-CAST: (err:FORG0001) Invalid cast: (fn:doc("/test/tde.json”)/content/array-node(“children”)/object-node()[1]/text(“name”), fn:doc("/test/tde.json”)/content/array-node(“children”)/object-node()[2]/text(“name”)) cast as xs:string?
What is the proper syntax for extracting array nodes into a triple?
2nd somewhat related question: say I also wanted to have triples of the form “child1 is-sibling-of child2”. For the example above it would be “Bob Child is-sibling-of Sue Child”. What would be the proper syntax for this? I’m not even sure how to begin with this one.
Is TDE even the way to go here? Or is it better to do this programmatically? i.e. on document ingestion, generate those triples inside the document directly?
(If it’s relevant, the ML version being used is 9.)
-
I’ve been testing migrating one of our systems to Marklogic 9 and using the Optics API.
One of our functions involves grouping claims by member_id, member_name and getting the sums and counts, so I did something like this:
var results = op.fromView('test', 'claims') .groupBy(['member_id', 'member_name'], [ op.count('num_claims', 'claim_no'), op.sum('total_amount', 'claim_amount') ]) .orderBy(op.desc('total_amount')) .limit(200) .result() .toArray();
Above works fine. The results are of the form
[ { member_id: 1, member_name: 'Bob', num_claims: 10, total_amount: 500 }, ... ]
However, we also have a field “company”, where each claim is filed under a different company. Basically the relevant view columns are claim_no, member_id, member_name, company, claim_amount
I would like to be able to show a column that list the different companies for which the member_id/member_name has filed claims, and how many claims for each company.
i.e. I want my results to be something like:
[ { member_id: 1, member_name: 'Bob', num_claims: 10, total_amount: 500, companies: [ { company: 'Ajax Co', num_claims: 8 }, { company: 'Side Gig', num_claims: 2 } ] }, ... ]
I tried something like this:
results = results.map((member, index, array) => { var companies = op.fromView('test', 'claims') .where(op.eq(op.col('member_id'), member.member_id)) .groupBy('company', [ op.count('num_claims', 'claim_no') ]) .result() .toArray(); member.companies = companies; return member; });
And the output seems correct, but it also executes quite slowly - almost a minute (total number of claim documents is around 120k)
In our previous ML8 implementation, we were pre-generating summary documents for each member - so retrieval was reasonably fast with the downside that whenever we got a bunch of new data, all of the summary documents had to be re-generated. I was hoping that ML9’s optic API would make it easier to do the retrieval/grouping/aggregates on the fly so we wouldn’t have to do that.
In theory, I could just add company to the groupBy fields, then merge the rows in the result query as needed. But the problem with that approach is that I can’t guarantee I’ll get the top 200 by total amount (as was my original query)
So, the question is: Is there a better way of doing this with a reasonable execution time? Or should I just stick to pre-generating the summary documents?
-
I have a Marklogic 9 project that I’m configuring with Roxy. I’ve been following these examples: https://github.com/marklogic-community/roxy/wiki/Adding-Custom-Build-Steps
Basically, I have a server-side JS function that I want to call after deploy content. I have something like this:
then you would define your new method
def deploy_content # you can optionally call the original original_deploy_content # do your stuff here execute_query(%Q{ xquery version "1.0-ml"; xdmp:javascript-eval('var process = require("/ingestion/process.sjs"); process.postDeployContent();') }, :db_name => @properties["ml.app-name"] + "-content") end
The xquery being called here evaluates fine when executed via query console. But when I call ml local deploy content, I get the following error:
ERROR: 500 "Internal Server Error" ERROR: <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>500 Internal Server Error</title> <meta name="robots" content="noindex,nofollow"/> <link rel="stylesheet" href="/error.css"/> </head> <body> <span class="error"> <h1>500 Internal Server Error</h1> <dl> <dt>XDMP-MODNOTFOUND: var process = require("/ingestion/process.sjs"); process.postDeployContent(); -- Module /ingestion/process.sjs not found</dt> <dd></dd> <dt>in [anonymous], at 1:14 [javascript]</dt> <dd></dd> <dt>at 3:6, in xdmp:eval("var process = require(&quot;/ingestion/process.sjs&quot;); proce...") [javascript]</dt> <dd></dd> <dt>in /eval, at 3:6 [1.0-ml]</dt> <dd></dd> </dl> </span> </body> </html>
Why is the module not found when running via xquery from app_specific.rb?
Or… is there a better way to call a JS module function from here. Sorry, I’m not too familiar with the xquery side, so I just called a JS function instead.
-
Basically the title. The client is complaining that when he zooms in, the text labels for the nodes are quite large. Is there a way to keep the node labels at a fixed font size even when zooming in or out?
From the nodes documentation (http://visjs.org/docs/network/nodes.html), there’s a scaling.label option, but it doesn’t seem to work. I think this is only relevant if I’m using values to scale the nodes.
-
Does Roxy have support for deploying templates for use with Marklogic 9’s Template Driven Extraction?
Oct 2017
-
I’m trying out the Roxy deployer. The Roxy app was created using the default app-type. I setup a new ML 9 database, and I ran “ml local bootstrap” using the default ports (8040 and 8041)
Then I setup a node application. I tried the following (sample code from https://docs.marklogic.com/jsdoc/index.html)
var marklogic = require('marklogic'); var conn = { host: '192.168.33.10', port: 8040, user: 'admin', password: 'admin', authType: 'DIGEST' } var db = marklogic.createDatabaseClient(conn); db.createCollection( '/books', {author: 'Beryl Markham'}, {author: 'WG Sebald'} ) .result(function(response) { console.log(JSON.stringify(response, null, 2)); }, function (error) { console.log(JSON.stringify(error, null, 2)); });
Running the script gave me an error like:
$ node test.js { "message": "write document list: cannot process response with 500 status", "statusCode": 500, "body": "<error:error xsi:schemaLocation=\"http://marklogic.com/xdmp/error error.xsd\" xmlns:error=\"http://marklogic.com/xdmp/error\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n <error:code>XDMP-IMPMODNS</error:code>\n <error:name>err:XQST0059</error:name>\n <error:xquery-version>1.0-ml</error:xquery-version>\n <error:message>Import module namespace mismatch</error:message>\n <error:format-string>XDMP-IMPMODNS: (err:XQST0059) Import module namespace http://marklogic.com/rest-api/endpoints/config does not match target namespace http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED of imported module /MarkLogic/rest-api/endpoints/config.xqy</error:format-string>\n <error:retryable>false</error:retryable>\n <error:expr/>\n <error:data>\n <error:datum>http://marklogic.com/rest-api/endpoints/config</error:datum>\n <error:datum>http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED</error:datum>\n <error:datum>/MarkLogic/rest-api/endpoints/config.xqy</error:datum>\n </error:data>\n <error:stack>\n <error:frame>\n <error:uri>/roxy/lib/rewriter-lib.xqy</error:uri>\n <error:line>5</error:line>\n <error:column>0</error:column>\n <error:xquery-version>1.0-ml</error:xquery-version>\n </error:frame>\n </error:stack>\n</error:error>\n" }
If I change the port to 8000 (the default appserver that inserts into Documents), the node function executes correctly as expected. I’m not sure if I need to configure anything else with the Roxy-created appserver so that it works with the node.js application.
I’m not sure where the “DELETE_IF_UNUSED” part in the error message is coming from either. There doesn’t seem to be any such text in the configuration files generated by Roxy.
Edit: When accessing 192.168.33.10:8040 via the browser, I get a an xml with a similar error:
<error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <error:code>XDMP-IMPMODNS</error:code> <error:name>err:XQST0059</error:name> <error:xquery-version>1.0-ml</error:xquery-version> <error:message>Import module namespace mismatch</error:message> <error:format-string>XDMP-IMPMODNS: (err:XQST0059) Import module namespace http://marklogic.com/rest-api/endpoints/config does not match target namespace http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED of imported module /MarkLogic/rest-api/endpoints/config.xqy</error:format-string> <error:retryable>false</error:retryable> <error:expr/> <error:data> <error:datum>http://marklogic.com/rest-api/endpoints/config</error:datum> <error:datum>http://marklogic.com/rest-api/endpoints/config_DELETE_IF_UNUSED</error:datum> <error:datum>/MarkLogic/rest-api/endpoints/config.xqy</error:datum> </error:data> <error:stack> <error:frame> <error:uri>/roxy/lib/rewriter-lib.xqy</error:uri> <error:line>5</error:line> <error:column>0</error:column> <error:xquery-version>1.0-ml</error:xquery-version> </error:frame> </error:stack> </error:error>
If it matters, MarkLogic version is 9.0-3.1. It’s a fresh install too.
Any advice?
-
I’m trying to migrate one of my dev envts from ML8 to ML9. I have an import script that successfully works on the ML8 version, but there’s an error when I try running it against the ML9 database. The ML9 version is 9.0.3.1. The MLCP version is 9.0.3
My MLCP options file is as follows:
import -host 192.168.33.10 -port 8041 -username admin -password admin -input_file_path d:\maroon\data\mbastest.csv -mode local -input_file_type delimited_text -uri_id ClientId -output_uri_prefix /test/records/ -output_uri_suffix .json -document_type json -transform_module /ingestion/transform.js -transform_function testTransform -transform_param test -content_encoding windows-1252 -thread_count 1
Here’s the output of a test run with only 2 records in the test CSV file:
17/10/30 14:07:33 INFO contentpump.LocalJobRunner: Content type: JSON 17/10/30 14:07:33 INFO contentpump.ContentPump: Job name: local_455168344_1 17/10/30 14:07:33 INFO contentpump.FileAndDirectoryInputFormat: Total input paths to process : 1 17/10/30 14:07:38 WARN contentpump.TransformWriter: Failed document /test/records/31.json 17/10/30 14:07:38 WARN contentpump.TransformWriter: <error:format-string xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error, unexpected QName_, expecting $end or SemiColon_</error:format-string> 17/10/30 14:07:38 WARN contentpump.TransformWriter: Failed document /test/records/32.json 17/10/30 14:07:38 WARN contentpump.TransformWriter: <error:format-string xmlns:error="http://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error, unexpected QName_, expecting $end or SemiColon_</error:format-string> 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: completed 100% 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter: 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 2 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 2 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 0 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 2 17/10/30 14:07:38 INFO contentpump.LocalJobRunner: Total execution time: 5 sec
If I remove the transform params, the import works fine.
I thought it might be a parsing issue with my transform module itself, so I tried replacing it with the following example from the documentation:
// Add a property named "NEWPROP" to any JSON input document. // Otherwise, input passes through unchanged. function addProp(content, context) { const propVal = (context.transform_param == undefined) ? "UNDEFINED" : context.transform_param; if (xdmp.nodeKind(content.value) == 'document' && content.value.documentFormat == 'JSON') { // Convert input to mutable object and add new property const newDoc = content.value.toObject(); newDoc.NEWPROP = propVal; // Convert result back into a document content.value = xdmp.unquote(xdmp.quote(newDoc)); } return content; }; exports.addProp = addProp;
(Of course I changed the params in the MLCP options file accordingly)
The issue still persists even with just this test function.
Any advice?
-
I’ve been testing out the WebView component, but I can’t seem to get it to render things.
Sample here: https://snack.expo.io/r1oje4C3-
export default class App extends Component { render() { return ( <View style={styles.container}> <Text style={styles.paragraph}> Change code in the editor and watch it change on your phone! Save to get a shareable url. You get a new url each time you save. </Text> <WebView source={{html: '<p>Here I am</p>'}} /> <WebView source={{ uri: 'http://www.google.com'}} /> </View> ); } }
When running the above example in Expo, neither of the two WebView components seem to render. What am I doing wrong?
-
I’m trying out vis.js for generating graph visualization. For every edge in my graph, I have a number that describes the strength of the connection between two nodes. I’d like to render the vis.js graph such that the nodes that have stronger relationships (higher edge values) are closer together (edge length is shorter).
I’ve set the relationship strength (an integer) as the “value” attribute for each edge, but this only seems to make the edge lines slightly thicker for higher values.
What options should I be looking at? I’m not sure if this is supposed to be a function of vis’s physics-based stabilization.
Thanks for advice!
Sep 2017
-
I have an HTML page with around 10 charts generated by chart.js (so these are canvas elements). I want to be able to export the page content into a PDF file.
I’ve tried using jsPDF’s .fromHTML function, but it doesn’t seem to support exporting the canvas contents. (Either that or I’m doing it wrong). I just did something like:
$(".test").click(function() { var doc = new jsPDF() doc.fromHTML(document.getElementById("testExport")); doc.save('a4.pdf') });
Any alternative approaches would be appreciated.
-
I’m using ML8. I have a bunch of json documents in the database. Some documents have a certain property “summaryData”, something like:
{ ...(other stuff)... summaryData: { count: 100, total: 10000, summaryDate: (date value) } }
However, not all documents have this property. I’d like to construct an SJS query to retrieve those documents that don’t have this property defined. If it was SQL, I guess the equivalent would be something like “WHERE summaryData IS NULL”
I wasn’t sure what to search for in the docs. Any advise would be helpful.
-
I’m using ML8 and Node.js. The documentation here: http://docs.marklogic.com/guide/node-dev/documents#id_68765 describes how to do conditional updates in ML using the versionId field.
But for example if I want to do a conditional update on a different field, is it possible?
My scenario is: I have JSON documents with elements assignedTo and assignDate (where assignDate is set to current date every time a new value is set to assignedTo)
Now, for my “Assign” operation, I would like to make sure that no one else has changed the assignedTo/assignDate fields between the time I read the document and when I perform the update. I don’t care if other fields in the same document have been updated or not - if other fields have been updated, I can still proceed with the Assign operation (hence I cannot use the versionId approach, since that covers the whole document)
How can this be done?
Aug 2017
-
Sorry if the title is unclear, but I’m finding my problem hard to explain concisely. I have a number of JSON documents with structure like this:
{ "count": 100, "groups": [ { "name": "group A", "count": 12 }, { "name": "group B", "count": 22 }, { "name": "group C", "count": 7 } ] }
Basically, the document has an item count plus a breakdown of that count into smaller groups. So this record represents a collection of 100 items, of which 12 are from group A, 22 are from group B, 7 are from group C.
Now, I have an element range index on “count” and a bunch of such documents. I’d like to be able to sort by any of the options:
- sort by total count (descending or ascending)
- sort by group A count (descending or ascending)
- sort by group B count (descending or ascending)
- sort by group C count (descending or ascending)
I’ve tried
.orderBy(qb.sort("count", "descending"))
This seems to be sort by the total cost (the one in the document root), but I’m not sure if that’s always true or I need to specify something else to guarantee it always gets that particular one.
For sorting by a specific group, I have no idea how to specify it.
Any advice?
-
We’ve been trying to set up flexible replication in our system which uses a MarkLogic database. We followed the instructions from https://docs.marklogic.com/8.0/guide/flexrep/quick_start and have been able to set up flexible replication between two MarkLogic servers. We have verified that new documents created in the master are copied over to the replica. However, the master database currently has more than 47 million records that were there before we configured the replication. Once the replication process was triggered, we observed that the documents are being replicated to the replica very slowly. Roughly 20,000 documents were replicated within the first two hours. The rate is very slow, it would take months for the old records to be fully replicated.
Our questions are:
-
We are looking into increasing the hardware specs of the two servers, but aside from that would anyone have any advice or documentation as to how we could speed up the replication? I couldn’t find any existing documentation regarding this?
-
Failing that, would it be possible to set up flexible replication without needing to replicate the initial data set? FYI, we also tried to clone the master database and use the clone as the replica. (We thought this might mean that the older records don’t have to be replicated.) However, in this case we encountered XDMP-NEWSTAMP and XDMP-EXTIME errors on the replica server, so we gave up on this approach. A sample of the errors encountered on the replica is below:
2017-08-03 18:45:04.376 Notice: exp-rest-content-flexrep: XDMP-NEWSTAMP: Timestamp too new for forest exp-rest-content-001-1 (15017569242290900) 2017-08-03 18:45:04.376 Notice: exp-rest-content-flexrep: in /apply.xqy [1.0-ml] 2017-08-03 18:45:04.379 Notice: TaskServer: XDMP-EXTIME: try { let $raw-module-name := module-path($action-to-execute/p:module) let $module-kind := module-kind($raw-module-name) let $module-name := if ($module-kind = “xquery” or $module-kind = “javascript”) then $raw-module-name else $cpfi:xslt-action return if ($module-name = “") then fn:error((), “CPF-ACTIONNOTFOUND”, “Default success”) else if ($module-kind = “javascript”) then (xdmp:trace(“CPF Action Invoke”, fn:string-join(($caller, xdmp:get-current-user(), $uri, $state-or-status, $raw-module-name), " “)), xdmp:invoke($module-name, (fn:QName("”,“uri”), $uri, xs:QName(“cpf:document-uri”), $uri, fn:QName("”,“transition”), $chosen-transition, options-var-js($action-to-execute)), $invoke-options)) else (xdmp:trace(“CPF Action Invoke”, fn:string-join(($caller, xdmp:get-current-user(), $uri, $state-or-status, $raw-module-name), " “)), xdmp:invoke($module-name, ($vars, xs:QName(“cpf:transition”), $chosen-transition, options-var($action-to-execute), if ($module-kind = “xslt”) then (xs:QName(“cpf:stylesheet-uri”), $raw-module-name) else ()), $invoke-options)) } catch ($e) { let $trace := let $context := fn:concat($caller, " “, $uri, " action failed”) return (cpf:log(fn:string-join(($context, $e/err:format-string), " “), “error”), cpf:log(($context, $e), “fine”)) let $failure-action := ($pipelines/p:failure-action)[1] let $raw-failure-module := module-path($failure-action/p:module) let $failure-kind := module-kind($raw-failure-module) let $failure-module := if ($failure-kind = “xquery” or $failure-kind = “javascript”) then $raw-failure-module else $cpfi:xslt-action return if ($failure-module = “") then fn:error((), “CPF-ACTIONNOTFOUND”, “Default failure action”) else xdmp:invoke($failure-module, ($vars, xs:QName(“cpf:transition”), $chosen-transition, options-var($failure-action), xs:QName(“cpf:exception”), $e, if ($failure-kind = “xslt”) then (xs:QName(“cpf:stylesheet-uri”), $raw-failure-module) else ()), $invoke-options) } – Time limit exceeded 2017-08-03 18:45:04.379 Notice: TaskServer: in /MarkLogic/cpf/triggers/internal-cpf.xqy, at 213:4, 2017-08-03 18:45:04.379 Notice: TaskServer: in execute-action(“on-state-enter”, “http://marklogic.com/states/initial", “/_smslogs/5849823.xml”, (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …),
, (fn:doc(“http://marklogic.com/cpf/pipelines/12349495875628658916.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/13179541037342910978.xml")/p:pipeline, …), fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline/p:state-transition[3]/p:default-action, fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline/p:state-transition[3]) [1.0-ml] 2017-08-03 18:45:04.379 Notice: TaskServer: $caller = “on-state-enter” 2017-08-03 18:45:04.379 Notice: TaskServer:different-transaction t…
$state-or-status = “http://marklogic.com/states/initial" 2017-08-03 18:45:04.379 Notice: TaskServer: $uri = “/_smslogs/5849823.xml” 2017-08-03 18:45:04.379 Notice: TaskServer: $vars = (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …) 2017-08-03 18:45:04.379 Notice: TaskServer: $invoke-options = 2017-08-03 18:45:04.379 Notice: TaskServer: $pipelines = (fn:doc(“http://marklogic.com/cpf/pipelines/12349495875628658916.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/13179541037342910978.xml")/p:pipeline, …) 2017-08-03 18:45:04.379 Notice: TaskServer: $action-to-execute = fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline/p:state-transition[3]/p:default-action 2017-08-03 18:45:04.379 Notice: TaskServer: $chosen-transition = fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline/p:state-transition[3] 2017-08-03 18:45:04.379 Notice: TaskServer: $e = <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd” xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance” xmlns:error="http://marklogic.com/xdmp/error">error:codeXDMP-NEWSTAMP</error:code>error:name/<error:xquery…</error:error> 2017-08-03 18:45:04.379 Notice: TaskServer: in /MarkLogic/cpf/triggers/internal-cpf.xqy, at 342:6, 2017-08-03 18:45:04.379 Notice: TaskServer: in execute-transition(“on-state-enter”, “http://marklogic.com/states/initial", “/_smslogs/5849823.xml”, (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …), <trgr:trigger xmlns:trgr="http://marklogic.com/xdmp/triggers">trgr:trigger-id6551367241994447650</trgr:trigger-id><trgr:trig…</trgr:trigger>, (fn:doc(“http://marklogic.com/cpf/pipelines/12349495875628658916.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline, fn:doc(“http://marklogic.com/cpf/pipelines/13179541037342910978.xml")/p:pipeline, …), (fn:doc(“http://marklogic.com/cpf/pipelines/12349495875628658916.xml")/p:pipeline/p:state-transition[2], fn:doc(“http://marklogic.com/cpf/pipelines/3358424510998587926.xml")/p:pipeline/p:state-transition[3], fn:doc(“http://marklogic.com/cpf/pipelines/13179541037342910978.xml")/p:pipeline/p:state-transition[1], …), <p:null-transition xmlns:p="http://marklogic.com/cpf/pipelines"><p:state>http://marklogic.com/states/initial</p:state></p:null-transition>) [1.0-ml] 2017-08-03 18:45:04.379 Notice: TaskServer: $caller = cpf:state(“http://marklogic.com/states/initial") 2017-08-03 18:45:04.379 Notice: TaskServer: $state-or-status = () 2017-08-03 18:45:04.379 Notice: TaskServer: $uri = (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …) 2017-08-03 18:45:04.379 Notice: TaskServer: in /MarkLogic/cpf/triggers/internal-cpf.xqy, at 358:3, 2017-08-03 18:45:04.379 Notice: TaskServer: in int:execute-state-transition(“on-state-enter”, cpf:state(“http://marklogic.com/states/initial"), “/_smslogs/5849823.xml”, (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …), <trgr:trigger xmlns:trgr="http://marklogic.com/xdmp/triggers">trgr:trigger-id6551367241994447650</trgr:trigger-id><trgr:trig…</trgr:trigger>) [1.0-ml] 2017-08-03 18:45:04.379 Notice: TaskServer: $caller = cpf:state(“http://marklogic.com/states/initial") 2017-08-03 18:45:04.379 Notice: TaskServer: $state = () 2017-08-03 18:45:04.379 Notice: TaskServer: $uri = (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …) 2017-08-03 18:45:04.379 Notice: TaskServer: in /MarkLogic/cpf/triggers/on-state-enter.xqy, at 41:6 [1.0-ml] 2017-08-03 18:45:04.379 Notice: TaskServer: $state = cpf:state(“http://marklogic.com/states/initial") 2017-08-03 18:45:04.379 Notice: TaskServer: $trace = () 2017-08-03 18:45:04.379 Notice: TaskServer: $vars = (xs:QName(“trgr:uri”), “/_smslogs/5849823.xml”, xs:QName(“trgr:trigger”), …) 2017-08-03 18:45:04.379 Notice: TaskServer: XDMP-NEWSTAMP: Timestamp too new for forest exp-rest-content-001-1 (15017569242290900) 2017-08-03 18:45:04.379 Notice: exp-rest-content-flexrep: XDMP-NEWSTAMP: Timestamp too new for forest exp-rest-content-001-1 (15017569242290900) 2017-08-03 18:45:04.379 Notice: exp-rest-content-flexrep: in /apply.xqy [1.0-ml]different-transaction t… -
-
We have a search application using MarkLogic node.js. We use parsedQuery like this:
qb.parsedFrom(prop.search, qb.parseBindings( qb.word('name', qb.bind('name')), qb.word('birthdate', qb.bind('birthdate')), qb.range('count', qb.datatype('float'), qb.bind('count')) ) )
The above currently supports search syntax like “count GT 50”, etc. We would like to support searching using a derived value such as age. That is, we want to support a search syntax like “age GT 10”, where the age value is not stored in the documents in the database but rather needs to be computed from the birthdate on the fly. We can’t store the age in the documents since the age changes depending on the current date.
Is this possible and if so, how? If it matters, we are using ML8
-
Is there an easy way to delete all the element range indexes on a given database?
Thanks
Jul 2017
-
I’m using the demo of jsPlumb here:
In this demo, there’s no way to move an existing connection to a different target node. Any idea how to do it?
Some of the other examples have movable connections, but they also use specific endpoints on the nodes. I like this particular example where I can drag the connection endpoint to any point of the target node.
-
I’m trying out jsplumb. I have a copy of this demo: https://jsplumbtoolkit.com/community/demo/statemachine/index.html
In this demo, when I drag one of the nodes outside of the canvas boundary, a scrollbar appears to indicate the canvas area has expanded. However, I still have to manually scroll the view to see the dragged node.
I would like the view to automatically scroll when I drag a node beyond the edge. Same thing when dragging a new connector, I would like to automatically scroll the view - so I can choose to connect to a node currently outside the visible area. Any advice on how to do this?
Secondary question: In the demo above the scrollbars appear as expected when I drag elements off the right or bottom of the canvas, but not when I drag them off the left edge or off the top edge. That is, if I drag a node upward off the canvas and drop it there, I no longer have any way of viewing that node or dragging it back down. Is there a way around this?
-
I have a number of documents from different sources. Many of them reference a company name, but may have stored the information slightly differently. The name is a field in the documents.
I’d like to be able to detect variations on the same name, something like:
- Ajax Company Incorporated
- Ajax Co. Inc.
- Ajax Company Inc.
- Ajax Company
- Ajax Company (formerly Ajax Unlimited)
- etc
Does MarkLogic have any facility to query documents that have “similar” name as above? I’m not sure if there’s a more technical term that I should be searching for. Preferably for either the node client API or server-side js.
Jun 2017
-
Random thoughts while walking at night: The structure of government can be a bit analogous to the structure of a software development project. The Constitution is like the requirements for a project. It’s kind of high-level and (I believe) shouldn’t be too detailed. Supposedly the requirements are written by the client. For a country like the Philippines the client is “we the sovereign Filipino people”. Slight tangent: I used to know this guy who was one of those rabid “we need to amend the constitution” types and he asked me to review a “mathematical model to track the budget as a function of tax collection and monetary policy” that he wanted to include in a proposed new constitution.
-
The scenario is: the user can input a URL for an image file, then on form submit I want to download the file and submit it to the server. As if they had selected the file using an
<input type='file' />
.Of course I can just do it on the server-side. But my question: how can I do this on the browser side via JavaScript? Is it possible?
-
A while back I wrote about my experience coding and maintaining an in-house web framework at a previous job. It was a full-stack web framework. We had libraries for front-end Javascript up to server-side database connections. And the entire stack was tightly coupled. But while the framework was serviceable, it was almost always behind modern trends in web development. I always felt like we were playing catch-up. And as a developer I wanted to widen my horizons and try out more things.
-
Something similar to my question here: https://stackoverflow.com/questions/40715822/marklogic-node-js-api-group-by-and-sort-by-count
I have documents in Marklogic with fields name and amount. I want to get the total amount for each name. Basically in SQL it would be
select name, sum(amount) from table group by name
I have range indexes for both name and amount. For getting sum aggregates, the documentation suggests something like valuesBuilder.fromIndexes(‘amount’).aggregates(‘sum’), but this only gets the sum for all records, instead of per name like I want.
Any advice?
Apr 2017
-
I recently attended a few training sessions for MarkLogicheld at an office in a nearby business center. Now, I'll forgive you for not knowing what MarkLogic is, as even I hadn't heard of it before six months ago. MarkLogic is (apparently) the leading Enterprise NoSQL provider. NoSQL is big and sexy right now because of the supposed advantages in handling big data, and large web companies like Google and Facebook use a lot of NoSQL in the backend.
Mar 2017
-
Back in 2004, I signed up for the Google Code Jam for the first time. Unfortunately I didn’t make it past the qualifying round. I was a bit luckier in 2008 and 2010, making it to round 2 both times. In fact in 2008 as I recall I was one of only two participants from the Philippines who made it to round 2, which allowed me to jokingly brag about being the #2 programmer in the country.
-
Recently, a developer needed to undergo a tech interview at US immigration: (Click to view full-size) 9 Mar 2017 1:30amClose This may surprise some people I’ve worked with, but I didn’t have formal computer science training in school. I’m not actually a computer science major. Yet I’ve worked as a software developer for more than a decade now. Literally zero times have I needed to write a sorting function or balance a BST.
-
I’ve been hesitant to try Python 3.x because it’s not backward compatible with Python 2.x which I’ve been using for scripting since forever. But recently I found out that since Python 3.3, they’ve included a launcher in the Windows version that supports having both versions installed. You can use the launcher to specify the Python version to use at the command line (it defaults to whichever version was installed first):
Feb 2017
-
I had been meaning to try writing a Twitter bot for a while now. I figured a trivia bot would be pretty easy to implement, so I spent some time a couple of weekends to rig one together. It’s (mostly) working now, the bot is active as triviastorm on Twitter, with a supporting webapp deployed on https://triviastorm.net/. The bot tweets out a trivia question once every hour. It will then award points to the first five people who gave the correct answer.
-
There are a few things that one should consider when using and integrating an open source library into your application: What are the licensing terms for the library? There are some liberal licenses that mostly let you do anything you want. The MIT license is an example of a very permissive license. Other licenses may provide a number of restrictions. Can you integrate with closed-source software? Can you distribute binaries without the source?
-
Back when I was starting out as a software developer, webapps weren’t really a thing. Not as much as they are now anyway. My company provided training to new hires, but I didn’t get any web development training at the time, even though they already had a few web development projects in play at the time. Instead my initial training involved mostly development of so-called client-server software. This was software that was installed and run on the client machine but they would connect to a remote database server.
-
So after so many months of development you deployed your webapp to production and it’s up and running and everything is fine and you celebrate and your work is done right? Not really. Two days later you get an urgent support call in the middle of the night. (Your clients are halfway across the world.) They’re asking why the website is inaccessible. You check via your browser and sure enough there’s an error 500.
Jan 2017
-
Hopefully by now most developers and project managers are well aware of the mythical man-month and Brooks’ Law: Adding manpower to a late software project makes it later The idea is that communications overhead scales up quickly as you add more people to a project. Oftentimes it is counter-intuitively not worthwhile to keep adding more people to try to catch up. Some implications of larger team/project size may not be immediately obvious.
-
Just a list I’ve been maintaining for a while: (Disclaimer: This list in no way implies that developers who don’t exhibit all of these attributes are terrible human beings who don’t deserve to live. But working with developers who exhibit many of these traits will probably result in a better experience over the course of your developer career.) Laziness, Impatience and Hubris – from the well-known (notorious?) Larry Wall quote Communicates well; is able to explain and communicate his ideas clearly, especially to nontechnical people; able to write good documentation Understands the concerns with scheduling and project management and communicates clearly with the team to avoid problems.
-
So the other day I was reworking a Python script that I had been using for years on my home PC to manage and categorize some downloaded files for me. This time I wanted to add some smarter behavior to make it more able to figure out when to group files into folders without constantly needing manual intervention from me. To do this, I needed to persist some data between runs – so that the script remembers how it categorized previous files and is able to group similar files together.