SoftOne BigSata Solutons approach to high defined use cases, big data technology solutions and data management tools to meet the complex demands. Applicable to deliver on-demand data sets from Hadoop, Spark, GreenPlum Database including governed self-service analytics for large production user bases.
SoftOne BigData components include:
Apache Tika – detects and extracts metadata and text from more than a thousand different types of files (for example, PPT, XLS, PDF and). All these file types can be parsed using a single interface provided by Apache Tika.
Apache Drill – allows you to organize execution of SQL-queries on semi-structured data stored in NoSQL-storages. The feature of Apache Drill is its independence from the data storage scheme, it allows organizing data analysis in various storages without first defining their structure (schema-free) .ion
R-language – is a programming language for statistical processing of data and working with graphics.
Solr – platform full-text search with open source, based on Apache Lucene. Its main features: full-text search, highlighting results, dynamic clustering, integration with databases, processing documents with a complex format (for example, Word, PDF).
- Analytics that can be seamlessly embedded into crucial business applications to drive data monetization with customers and partners.
- Intuitive visual interface to integrate and blend Hadoop data with virtually any other source – including relational databases, NoSQL stores, enterprise applications, and more.
- Ability to design data integration logic faster than hand-coding approaches.
- Deep integration with the Hadoop ecosystem including Spark and compatibility with Kafka, Sqoop, and more.
- Automation to rapidly accelerate the ingestion and onboarding of hundreds or thousands of diverse and changing data sources into Hadoop.
- Support for leading Hadoop distributions, including Cloudera, Hortonworks, Amazon EMR, and MapR, with maximum portability of jobs and transformations between Hadoop platforms.
- Enterprise-level security for Cloudera and Hortonworks Hadoop clusters, with support for Kerberos, Sentry and Ranger.
- Applications: Pentaho BA Server
- Processing and data access: Pig, Hive, Sqoop, ETL, Lucene
- Computing: Map/Reduce
- Storage: HDFS, Spark, GreenPlum Database
- Platform: Kafka, Cloudera, Hortonworks, Amazon EMR,