SANSA 0.4 (Scalable Semantic Analytics Stack using Spark/Flink) Released

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

SANSA 0.4 (Scalable Semantic Analytics Stack using Spark/Flink) Released

Jens Lehmann

Dear all,

The Smart Data Analytics group [1] is happy to announce SANSA 0.4 - the
fourth release of the Scalable Semantic Analytics Stack. SANSA employs
distributed computing via Apache Spark and Flink in order to allow
scalable machine learning, inference and querying capabilities for large
knowledge graphs.

Website:   http://sansa-stack.net
GitHub:    https://github.com/SANSA-Stack
Download:  https://github.com/SANSA-Stack/SANSA-Stack/releases

You can find the FAQ and usage examples at http://sansa-stack.net/faq/.

The following features are currently supported by SANSA:

* Reading and writing RDF files in N-Triples, Turtle, RDF/XML, N-Quad
  format
* Reading OWL files in various standard formats
* Support for multiple data partitioning techniques
* SPARQL querying via Sparqlify
* Graph-parallel querying of RDF using SPARQL (1.0) via GraphX
  traversals (experimental)
* RDFS, RDFS Simple, OWL-Horst, EL (experimental) forward chaining
  inference
* Automatic inference plan creation (experimental)
* RDF graph clustering with different algorithms
* Terminological decision trees (experimental)
* Anomaly detection (beta)
* Knowledge graph embedding approaches: TransE (beta), DistMult (beta)

Noteworthy changes or updates since the previous release are:

* Parser performance has been improved significantly [2] e.g. DBpedia
  2016-10 can be loaded in <100 seconds on a 7 node cluster
* Support for a wider range of data partitioning strategies
* A better unified API across data representations (RDD, DataFrame,
  DataSet, Graph) for triple operations
* Improved unit test coverage
* Improved distributed statistics calculation (see ISWC paper [3])
* Initial scalability tests on 6 billion triple Ethereum blockchain data
  on a 100 node cluster [4]
* New SPARQL-to-GraphX rewriter aiming at providing better performance
  for queries exploiting graph locality
* Numeric outlier detection tested on DBpedia (en)
* Improved clustering tested on 20 GB RDF data sets

Deployment and getting started:

* There are template projects for SBT and Maven for Apache Spark as well
  as for Apache Flink available [5] to get started.
* The SANSA jar files are in Maven Central i.e. in most IDEs you can
  just search for “sansa” to include the dependencies in Maven projects.
* Example code is available for various tasks [6].
* We provide interactive notebooks for running and testing code [7] via
  Docker.

We want to thank everyone who helped to create this release, in
particular the projects supporting us [8]: Big Data Europe, HOBBIT,
SAKE, Big Data Ocean, SLIPO, QROWD, BETTER, BOOST and SPECIAL.

View this announcement on Twitter and the SANSA blog:
  http://sansa-stack.net/sansa-0-4/
  https://twitter.com/SANSA_Stack/status/1011633150257188864

Kind regards,

The SANSA Development Team
(http://sansa-stack.net/community/#Contributors)

 [1] http://sda.tech
 [2] http://sansa-stack.net/sansa-parser-performance-improved/
 [3] http://jens-lehmann.org/files/2018/iswc_distlodstats.pdf
 [4]
https://media.consensys.net/alethio-links-up-with-sansa-semantic-analytics-stack-to-analyze-ethereum-at-new-scales-b26055540167
 [5] http://sansa-stack.net/downloads-usage/
 [6] https://github.com/SANSA-Stack/SANSA-Examples
 [7] https://github.com/SANSA-Stack/SANSA-Notebooks
 [8] http://sansa-stack.net/powered-by/


--
Prof. Dr. Jens Lehmann
http://jens-lehmann.org
http://sda.tech
Computer Science Institute       Knowledge Discovery Department
University of Bonn               Fraunhofer IAIS
http://www.cs.uni-bonn.de        http://www.iais.fraunhofer.de
[hidden email]              [hidden email]