How to Increase Pageviews and Reduce Bounce Rate in WordPress

Posted on August 12th, 2018

When starting out, most bloggers believe that it is super hard to get people to your blog. Whereas most expert bloggers think that getting people to your blog is the easy part. Getting the users to stay on your site is harder. Most users come to your site and end up leaving without even going to the second page. When a user leaves without even going to the second page, it increases your bounce rate. It also decreases your pageviews per visit. On a bigger picture, it decreases your ad revenue. In this article, we will share with you tips and tricks that will help you increase pageviews and reduce bounce rate in WordPress.

Before we start

Lets cover some basics regarding terminology and technology. Bounce rate represents the percentage of visitors who enter your site and “bounce” (leave the site) rather than continue viewing other pages within the same site. Page view is a request to load a single page on an internet site. We use Google Analytics to track our data. You are welcome to use another analytics service, or you can simply install google analytics in your WordPress site.

Now that we have taken care of the basic terminology, you are probably wondering why the heck do these numbers matter?

If you are running a site that is primarily monetized by banner ads, then the number of pageviews matter. If you are trying to build a loyal audience, then the number of bounce rate matters. Also the lower your bounce rate, the better ads eCPM (cost per thousand) or CPC (cost per click) you will get. When the same user views the next page, your ad provider most likely has a better ad to serve them thus giving you a higher eCPM or CPC.

We have consulted with a lot of clients helping them increase their pageviews and reduce bounce rates. We have also done a lot of experiments on our own sites like List25. So all the methods that we will share are the ones that we have used in the past and know that they work.

P.S. These techniques will ONLY work if you have Good Content.

[video_page_section type=”youtube” position=”default” image=”http://track-n-test.com/wp-content/uploads/2018/08/youtube-banner1.jpg” btn=”light” heading=”” subheading=”” cta=”” video_width=”1080″ hide_related=”false” hide_logo=”false” hide_controls=”false” hide_title=”false” hide_fullscreen=”false”]https://www.youtube.com/watch?v=lxgE4HgXJ1g[/video_page_section]

Interlink Your Posts

Anytime that you can interlink your other posts within the post content, you are going to see an increase in pageviews. In WordPress 3.1, interlinking got even easier because you can simply search for the post you want to link while adding links. Interlinking techniques work great when you have a site with a lot of articles. If you are just starting out, then you will be a bit limited. So how do you go back and interlink older articles when you have something new? You can manually do it, but it will take some time. There are plugins that lets you automatically link keywords in WordPress (Although that article is showing you how we did this for affiliate links, you can use it for internal linking purposes as well). Not only does interlinking help you increase pageviews and reduce bounce rates, it also helps with SEO as well.

If you want to see an example of interlinking, then just look at the paragraph above.

Show Related Posts After the Post

One of the main reasons why the user leave your blog after reading the post is because you are not showing them what to do next. By showing the user with a list of “related posts” or “other popular posts”, you may get them to go on to visit another post in your site. There are a lot of ways you can add related posts to your blog. You can use a plugin called YARPP that has its advanced algorithm that picks the related post. You can show related posts by category or tags without using a plugin. You can also show related posts by showing other posts by the same author.

Show Excerpts on Front / Archive Pages

Showing excerpts on front/archive pages have two advantages. First, it decreases page load time. Second, it helps increase the pageviews. You should almost never show full posts on your front page or archive page. Imagine having like 25 images in one post, and then have 5 of those on one page. It would be a horrible user experience because of (slow load time and super long page) which would make the user leave your site. We have a tutorial on how to display post excerpts in WordPress themes. Most good theme frameworks like Genesis, Thesis, Headway etc. already have this option built-in.

Splitting up Long Posts

Are you writing a super long posts? Well, you can split it into multiple pages using the WordPress <!–nextpage–> tag in your post. Simply add it wherever you want, and your post will split into multiple pages. You can see an example of how we split our posts into two pages or even into five pages. You have to be very careful when doing this because if you do not have a sufficient amount of content on each page, then the user might get pissed of. We have seen a lot of big name sites like Forbes, NY Times, Wall Street Journal and others utilize this technique.

Interactive Sidebar

Your sidebar can play a crucial role in increasing pageviews and reducing bounce rate. You can show your popular posts in the sidebar. You can even customize it to show popular posts by week, month, all time. You can also show your most recent posts only on single post pages. We have seen sites that create custom images to navigate to specific posts of theirs. You can integrate other sections of your site in your sidebar for example look at our WordPress Coupons section or the gallery section in the sidebar.

Encourage Random Browsing

On List25 we created a feature called I’m Feeling Curious. When a user clicks on this button, they will be redirected to a random post in WordPress. We put the button in our header bar which was a very hot spot. After seeing good results, we ended up putting it on WPBeginner as well and called it Explore.

Results

When we started out List25, we faced a lot of criticism. People were saying that sites like these fail to grow because it is hard to keep a loyal audience. We asked around and a lot of people who had done something similar in the past reported that the bounce rate for a siimlar site was soaring in 80% range. Average user would view only one page per visit and leave. We started the site out to get some base data. Our bounce rate was in the 75% range. We slowly started implementing the changes mentioned above. Bounce rate decreased from the average of 76% to 42%. Our pageviews per visit increased to 2.79 / pages per visit. Average time spent on page went to the average of 3 minutes and 40 seconds << This is the average time spent for our 1 million unique visitors! What are you doing to increase pageviews and reduce bounce rate? Share with us.

Content retrieved from: https://www.wpbeginner.com/beginners-guide/how-to-increase-pageviews-and-reduce-bounce-rate-in-wordpress/.

Deep Learning & Art: Neural Style Transfer – An Implementation with Tensorflow in Python

Posted on August 12th, 2018

Posted by Sandipan Dey on January 2, 2018 at 1:00pm

View Blog

This problem appeared as an assignment in the online coursera course Convolution Neural Networks by Prof Andrew Ng, (deeplearing.ai). The description of the problem is taken straightway from the assignment.

In this assignment, we shall:

Implement the neural style transfer algorithm
Generate novel artistic images using our algorithm

Most of the algorithms we’ve studied optimize a cost function to get a set of parameter values. In Neural Style Transfer, we shall optimize a cost function to get pixel values!

Problem Statement

Neural Style Transfer (NST) is one of the most fun techniques in deep learning. As seen below, it merges two images, namely,

a “content” image (C) and
a “style” image (S),

to create a “generated” image (G). The generated image G combines the “content” of the image C with the “style” of image S.

In this example, we are going to generate an image of the Louvre museum in Paris (content image C), mixed with a painting by Claude Monet, a leader of the impressionist movement (style image S).

Let’s see how we can do this.

Transfer Learning

Neural Style Transfer (NST) uses a previously trained convolutional network, and builds on top of that. The idea of using a network trained on a different task and applying it to a new task is called transfer learning.

Following the original NST paper, we shall use the VGG network. Specifically, we’ll use VGG-19, a 19-layer version of the VGG network. This model has already been trained on the very large ImageNet database, and thus has learned to recognize a variety of low level features (at the earlier layers) and high level features (at the deeper layers). The following figure shows how a VGG-19 convolution neural net looks like, without the last fully-connected (FC) layers.

We run the following code to load parameters from the pre-trained VGG-19 model serialized in a matlab file. This takes a few seconds.
model = load_vgg_model(“imagenet-vgg-verydeep-19.mat”)
import pprint
pprint.pprint(model)

{‘avgpool1’: <tf.Tensor ‘AvgPool_5:0’ shape=(1, 150, 200, 64) dtype=float32>,
‘avgpool2’: <tf.Tensor ‘AvgPool_6:0’ shape=(1, 75, 100, 128) dtype=float32>,
‘avgpool3’: <tf.Tensor ‘AvgPool_7:0’ shape=(1, 38, 50, 256) dtype=float32>,
‘avgpool4’: <tf.Tensor ‘AvgPool_8:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘avgpool5’: <tf.Tensor ‘AvgPool_9:0’ shape=(1, 10, 13, 512) dtype=float32>,
‘conv1_1’: <tf.Tensor ‘Relu_16:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv1_2’: <tf.Tensor ‘Relu_17:0’ shape=(1, 300, 400, 64) dtype=float32>,
‘conv2_1’: <tf.Tensor ‘Relu_18:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv2_2’: <tf.Tensor ‘Relu_19:0’ shape=(1, 150, 200, 128) dtype=float32>,
‘conv3_1’: <tf.Tensor ‘Relu_20:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_2’: <tf.Tensor ‘Relu_21:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_3’: <tf.Tensor ‘Relu_22:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv3_4’: <tf.Tensor ‘Relu_23:0’ shape=(1, 75, 100, 256) dtype=float32>,
‘conv4_1’: <tf.Tensor ‘Relu_24:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_2’: <tf.Tensor ‘Relu_25:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_3’: <tf.Tensor ‘Relu_26:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv4_4’: <tf.Tensor ‘Relu_27:0’ shape=(1, 38, 50, 512) dtype=float32>,
‘conv5_1’: <tf.Tensor ‘Relu_28:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_2’: <tf.Tensor ‘Relu_29:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_3’: <tf.Tensor ‘Relu_30:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘conv5_4’: <tf.Tensor ‘Relu_31:0’ shape=(1, 19, 25, 512) dtype=float32>,
‘input’: <tensorflow.python.ops.variables.Variable object at 0x7f7a5bf8f7f0>}
The next figure shows the content image (C) – the Louvre museum’s pyramid surrounded by old Paris buildings, against a sunny sky with a few clouds.

For the above content image, the activation outputs from the convolution layers are visualized in the next few figures.

How to ensure that the generated image G matches the content of the image C?

As we know, the earlier (shallower) layers of a ConvNet tend to detect lower-level features such as edges and simple textures, and the later (deeper) layers tend to detect higher-level features such as more complex textures as well as object classes.

We would like the “generated” image G to have similar content as the input image C. Suppose we have chosen some layer’s activations to represent the content of an image. In practice, we shall get the most visually pleasing results if we choose a layer in the middle of the network – neither too shallow nor too deep.

First we need to compute the “content cost” using TensorFlow.

The content cost takes a hidden layer activation of the neural network, and measures how different a(C) and a(G) are.
When we minimize the content cost later, this will help make sure G
has similar content as C.

def compute_content_cost(a_C, a_G):
“””
Computes the content cost

Arguments:
a_C — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image C
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing content of the image G

Returns:
J_content — scalar that we need to compute using equation 1 above.
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape a_C and a_G
a_C_unrolled = tf.reshape(tf.transpose(a_C), (m, n_H * n_W, n_C))
a_G_unrolled = tf.reshape(tf.transpose(a_G), (m, n_H * n_W, n_C))

# compute the cost with tensorflow
J_content = tf.reduce_sum((a_C_unrolled – a_G_unrolled)**2 / (4.* n_H * n_W *
n_C))

return J_content

Computing the style cost

For our running example, we will use the following style image (S). This painting was painted in the style of impressionism, by Claude Monet .

def gram_matrix(A):
“””
Argument:
A — matrix of shape (n_C, n_H*n_W)

Returns:
GA — Gram matrix of A, of shape (n_C, n_C)
“””

GA = tf.matmul(A, tf.transpose(A))
return GA

def compute_layer_style_cost(a_S, a_G):
“””
Arguments:
a_S — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image S
a_G — tensor of dimension (1, n_H, n_W, n_C), hidden layer activations representing style of the image G

Returns:
J_style_layer — tensor representing a scalar value, style cost defined above by equation (2)
“””

# Retrieve dimensions from a_G
m, n_H, n_W, n_C = a_G.get_shape().as_list()

# Reshape the images to have them of shape (n_C, n_H*n_W)
a_S = tf.reshape(tf.transpose(a_S), (n_C, n_H * n_W))
a_G = tf.reshape(tf.transpose(a_G), (n_C, n_H * n_W))

# Computing gram_matrices for both images S and G (≈2 lines)
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)

# Computing the loss
J_style_layer = tf.reduce_sum((GS – GG)**2 / (4.* (n_H * n_W * n_C)**2))

return J_style_layer

The style of an image can be represented using the Gram matrix of a hiddenlayer’s activations. However, we get even better results combining this representation from multiple different layers. This is in contrast to the content representation, where usually using just a single hidden layer is sufficient.
Minimizing the style cost will cause the image G to follow the style of the image S.

Defining the total cost to optimize

Finally, let’s create and implement a cost function that minimizes both the style and the content cost. The formula is:

https://sandipanweb.files.wordpress.com/2018/01/12.png?w=150&h=16 150w” sizes=”(max-width: 456px) 100vw, 456px” />
def total_cost(J_content, J_style, alpha = 10, beta = 40):
“””
Computes the total cost function

Arguments:
J_content — content cost coded above
J_style — style cost coded above
alpha — hyperparameter weighting the importance of the content cost
beta — hyperparameter weighting the importance of the style cost

Returns:
J — total cost as defined by the formula above.
“””

J = alpha * J_content + beta * J_style
return J

The total cost is a linear combination of the content cost J_content(C,G) and the style cost J_style(S,G).
α and β are hyperparameters that control the relative weighting between content and style. (we have used values 10 and 40 respectively for α and β).

Solving the optimization problem

Finally, let’s put everything together to implement Neural Style Transfer!

Here’s what the program will have to do:

Create an Interactive Session
Load the content image
Load the style image
Randomly initialize the image to be generated
Load the VGG19 model
Build the TensorFlow graph:
- Run the content image through the VGG19 model and compute the content cost.
- Run the style image through the VGG19 model and compute the style cost
  Compute the total cost.
- Define the optimizer and the learning rate.
Initialize the TensorFlow graph and run it for a large number of iterations (we have used 200 iterations), updating the generated image at every step.

Let’s first load, reshape, and normalize our “content” image (the Louvre museum picture) and “style” image (Claude Monet’s painting).

Now, we initialize the “generated” image as a noisy image created from the content_image. By initializing the pixels of the generated image to be mostly noise but still slightly correlated with the content image, this will help the content of the “generated” image more rapidly match the content of the “content” image. The following figure shows the noisy image:

Next, let’s load the pre-trained VGG-19 model.

To get the program to compute the content cost, we will now assign a_C and a_G to be the appropriate hidden layer activations. We will use layer conv4_2 to compute the content cost. We need to do the following:

Assign the content image to be the input to the VGG model.
Set a_C to be the tensor giving the hidden layer activation for layer “conv4_2”.
Set a_G to be the tensor giving the hidden layer activation for the same layer.
Compute the content cost using a_C and a_G.

Next, we need to compute the style cost and compute the total cost J by taking a linear combination of the two. Use alpha = 10 and beta = 40.

Then we are going to set up the Adam optimizer in TensorFlow, using a learning rate of 2.0.

Finally, we need to initialize the variables of the tensorflow graph, assign the input image (initial generated image) as the input of the VGG19 model and runs the model to minimize the total cost J for a large number of iterations.

Results

The following figures show the generated images (G) with different content (C) and style images (S) at different iterations in the optimization process.

Content

Style (Claud Monet’s The Poppy Field near Argenteuil)

Generated

Content

Style

Generated

Content

Style

Generated

Content

Style (Van Gogh’s The Starry Night)

Generated

Content

Style

Generated

Content (Victoria Memorial Hall)

Style (Van Gogh’s The Starry Night)

Generated

Content (Taj Mahal)

Style (Van Gogh’s Starry Night Over the Rhone)

Generated

Content (me)

Style (Van Gogh’s Irises)

Generated

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/deep-learning-amp-art-neural-style-transfer-an-implementation.

Comparing AI Strategies – Systems of Intelligence

Posted on August 11th, 2018

Posted by William Vorhies on July 24, 2018 at 9:00am

Summary: The fourth and final AI strategy we’ll review is Systems of Intelligence (SOI). This is getting nearly as much attention as the Vertical strategy we previously reviewed. It’s appealing because it seems to offer the financial advantages of a Horizontal strategy but its ability to create a defensible moat requires some fine tuning.

In the last several articles we’ve been looking at different strategies for successful AI companies.

We described Data Dominance, and the Vertical and Horizontal strategies. This brings us to the fourth and last AI strategy, Systems of Intelligence (SOI).

Systems of Intelligence strategy like the Vertical strategy is the brain child of a successful VC, Jerry Chen of Greylock Partners. Like many VCs Mr. Chen is struggling to define criteria for investing in a technology market that is changing so rapidly. So far the Vertical strategy and this Systems of Intelligence strategy are getting the most press. This may or may not indicate agreement from the investing community at large.

The single factor that ties together all four of these strategies however is the need to create a moat of defensibility that will prevent fast followers from simply copying your idea.

Old Moats Have Fallen

It’s interesting to review how the technology market has evolved over time and to remember the moats that were established in each epoch. Gil Dibner a self-described venture investor has offered perhaps the most thoughtful response to SOI, also offers us this brief historical recap of tech VC investing.

1970–1985: The “Silicon” Era (e.g. Intel, founded 1968)
1975–1990: The “Information” Era (e.g. Microsoft, founded 1975, and Oracle, founded 1977)
1985–2000: The “Physical Network” Era (e.g. Cisco, founded 1984)
1995–2010: The “Logical Network” Era (e.g. Netscape, founded 1994)
2000–2015: The “SaaS” Era (e.g. Salesforce, founded 1999)
2005–2020: The “Network Effect” Era (e.g. Facebook, founded 2004 and AirBnB founded 2008)
2015–2030: The “System of Intelligence” Era?

If you date the rise of modern ML and AI from the advent of open source NoSQL and Hadoop in 2007 it’s easy to spot that VC investing strategy has been dominated by the Network Effect and the SaaS strategies.

It’s not that these have become invalid. It’s that they have been fully exploited and with very few exceptions the leaders with these strategies have been established, largely freezing out future competitors. On the mind of VCs, startup founders, and all of us interested in AI-first businesses is what comes next? Mr. Chen offers:

“I believe that deep technology moats aren’t completely gone and defensible business models can still be built around IP. If you pick a place in the technology stack and become the absolute best of breed solution you can create a valuable company. However, this means picking a technical problem with few substitutes, that requires hard engineering, and needs operational knowledge to scale.”

What’s the Underlying Concept behind Systems of Intelligence?

In his seminal article on SOI, Chen makes these assertions:

Today the market favors full stack solutions that frequently relied on the SaaS model. The details of the technology are no longer important as long as the elements of the stack function well together.
Today’s full-stack is grounded on Systems of Record. There are four fundamental systems of record, one for your customers (CRM), one for your employees (HCM), and two for your assets (ERP financials/ITSM). These are the databases on which your applications are built. The leading players in this SaaS strategy are established and dominant.

Systems of Engagement are the interfaces that sit atop the Systems of Record and allow or control how users can utilize and interface with the data. These have migrated up from mainframe terminals, through dashboard visualizers like Tableau, until today we have Slack, Alexa, Wechat, chatbots and every other variant on text and conversational UIs. Older SOE applications tend not to go away but continue to coexist with newer forms.

To gain competitive advantage an SOE must rely on network effect (the friendliness and utility of its interface to attract maximum users) since the data it provides resides in SORs and is shared with other SOEs. The result is today’s modern enterprise stack where SOEs reside atop SORs.

The disruptive core of Systems of Intelligence is that there is emerging a middle layer, the SOI layer that enhances the value of the SORs by adding data from multiple, sometimes external sources, or adding previously unseen insight through the addition of ML/AI.

The ability to bridge multiple SORs makes these applications more defensible against the internal capabilities of an SAP or a PeopleSoft, and adding value from external data sources (for example web logs to produce web analytics) makes their moat even wider.

Of course the ability to add value to multiple data sources relies on the value add from ML/AI.

What Do These SOI Opportunities Look Like

First of all it should be evident that this is a horizontal strategy that applies to enterprise platforms. Horizontal strategies are those that represent solutions that can be repurposed across multiple industries.

But unlike the broad horizontal strategy that requires the user/customer to adapt the solution to the specific use case requiring more effort than they are likely to want to apply, the SOI strategy most likely results in a fairly complete standardized application (full stack) around a specific set of processes. The assumption is that these are sufficiently similar across industries that adaptation and configuration can be accomplished through repeatable processes. Chen offers these three potential scenarios:

Customer facing applications around the customer journey.
Employee facing applications like HCM.
ITSM, financials, or infrastructure systems like security, compute/ storage/ networking, and monitoring/ management.

The core capabilities are data blending and ML/AI analysis leading to predictive or even prescriptive actions. You could build these with modern advance analytic platforms like Alteryx, SAS, or SPSS, or simply write them in R or Python.

Of course the focus could be more narrowly a group of target companies across a more specialized industry like finance or construction. Here in addition to requiring process subject matter expertise, industry expertise would also be needed.

Does the Systems of Intelligence Strategy Provide a Defensible Moat?

It’s not clear that it does, at least without adding additional factors to the equation. One of the best discussions and criticisms of SOI comes from Gil Dibner, who self identifies as a venture investor.

The first issue that Dibner identifies is that any fast follower can recreate your product if your only advantage is expertise with ML/AI. Thanks to the open source ethos of ML/AI, none of that IP is truly proprietary.

It may be possible to achieve at least a temporary lead if the technical complexity of your solution is very high, especially in the developing arts of image, text, and speech. However, this advantage is not likely to last. AI is rapidly becoming a commodity.

There are some features that Dibner believes can add some defensibility at least in some cases. One is if the AI technology is extremely hard that we mentioned above.

Another is a variation on the Network Effect. Remember that the idea of network effect says that the value of the network becomes greater as the number of users increases. Generally this is fully exploited and no longer provides a moat.

However, Dibner envisions the possibility of Systems of Network Intelligence. He imagines a system that works across various parties in a supply chain where shared information across customers can be seen to add value.

Another possible moat he offers is the ability to integrate human-in-the-loop applications with the AI so that the resulting hybrid system learns more rapidly and more effectively than the AI alone.

At that point, Dibner starts to suggest features that are clearly from the vertical strategy: domain expertise, data dominance, and full stack applications.

The horizontal cross-industry approach of Systems of Intelligence will no doubt remain its most appealing feature. If defensible, the strength of a single process focused application that could be sold across a variety of industries sound like a pot of gold.

Whether there is any real moat here is questionable.

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/comparing-ai-strategies-systems-of-intelligence.

Top Trends in AI in 2018

Posted on August 10th, 2018

Posted by Pradeep Menon on February 19, 2018 at 10:00pm

According to Gartner’s hype cycle of emerging technologies, 2017; Deep Learning and Machine Learning have reached the peak of inflated expectations. Artificial General Intelligence (AGI) and Deep Reinforcement Learning are in the phase of innovation trigger.

We are in 2018. The sentiment over Artificial Intelligence (AI) is euphoric. Every technology firm is jumping on the AI first bandwagon. Companies like Google, Microsoft, Amazon, and Alibaba are pushing the frontiers. There are a plethora of smaller players that are doing cutting-edge work in a niche area. AI is permeating into everyday lives.

As an active practitioner in this field, my views on the top AI trends to look out for in 2018 are as follows:

Firstly, let’s get the context of AI correct.

AI encompasses the following:

Machine Learning (subset of AI)
Deep Learning (subset of Machine Learning)

Trend #1: Machine Learning to Automated Machine Learning

A typical machine learning process involves the following stages:

A data scientist spends a lot of time in understanding the data. A data scientist tries to fit multiple models. They try out multiple algorithms to find the best model fitment that provides the optimal result.

Automated machine learning attempts to automate the process of performing exploratory analysis. It tries to automate the process of finding hidden patterns. It automates the training of multiple algorithms. In short, automated machine learning saves a lot of data scientist time. Data scientist spends lesser time in spending on model building and more time on evaluation. Automated machine learning is also a blessing for non-data scientists. It helps them to build decent machine learning models without deep-diving into the mathematics of data science.

In 2018, I see that this trend will become mainstream. Google recently launched AutoML in their cloud computing platform. There are niche companies like Data Robot who specialise in this area and are becoming mainstream.

“Automated Machine Learning will mature in 2018.”Trend #2: Increase in Cloud Adoption for Machine Learning

Machine learning is a lot about data. It is the process of storing data. It is a process of analyzing data, training models and evaluating them. It is a data and compute-intensive process. It is iterative with hits and misses.

Cloud computing provides an ideal platform where machine learning thrives. Cloud computing is not a new concept. Traditional cloud offerings were limited to Infrastructure as a Service (IaaS). Over the past few years, public cloud providers have started offering Machine Learning as a Service. All the big cloud providers have a competitive offering in Machine Learning as a Service.

I see this trend continuing to increase in 2018. The cost of computing and storage in the cloud is lower and on-demand. The costs are controllable. The cloud providers provide out-of-the-box solutions. Data scientist now can spin up analytical sandboxes in the cloud, perform the analysis, experiment with a model and shut it down. They can automate the process as well.Machine learning in the cloud makes the life of a data scientist easier.

“Cloud computing would continue to enable Machine Learning acceleration in 2018.”Trend #3: Deep Learning Becomes Mainstream

Deep learning is a subset of machine learning that utilizes neural network-based algorithms for machine learning tasks. Deep learning methods have proven to be very useful in the field of computer vision, natural language processing, and speech recognition.

Deep learning has been around for some time now. However, deep learning was in relative obscurity all these years. This obscurity was because of the following two reasons:

The sheer amount of data required to train deep neural networks.
The sheer computing power required to train deep neural networks.

These reasons cease to exist now. There is data now. There is abundant computing process available. The research in deep learning has never been so ebullient as compared to the past. Increasingly, deep learning is powering the fruition of complex use cases. Deep learning’s application ranges from workplace safety to smart cities to image recognition and online-offline shopping.

This trend will continue in 2018.

“Deep Learning will continue to be rapidly adopted by enterprises in 2018.”Trend #4: AI Regulation Discussion Gains Traction

In 2017, data science community avidly followed the debate between Elon Musk and Mark Zuckerberg. The topic of the debate: Should we fear the rise of AI? Elon Musk had a pessimistic view on the topic. His views: the rise of AI has imminent dangers for humanity. On the other hand, Mark Zuckerberg, had a much more optimistic outlook on the topic. His views: AI would benefit humans.

This debate between these tech tycoons, has everyone thinking about AI and its regulation. In Jan 2018, Microsoft chimed in saying that AI needs to be regulated before it’s too late. There is no easy answer to this question. AI is still an evolving field. Excessive regulations have always stifled innovation. Maintaining a delicate balance is crucial. The regulation of AI is an uncharted territory with technical, legal and even ethical undertones. This is a healthy discussion point.

“Should AI be regulated? This will be a key discussion point in 2018.

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/top-trends-in-ai-in-2018.

Comparing AI Strategies – Vertical vs. Horizontal

Posted on August 8th, 2018

Posted by William Vorhies on July 17, 2018 at 7:00am

Summary: Getting an AI startup to scale for an IPO is currently elusive. Several different strategies are being discussed around the industry and here we talk about the horizontal strategy and the increasingly favored vertical strategy.

Looks like there’s a problem brewing in AI startup land. While AI is most certainly destined to be the next great general purpose technology, on a par with the steam engine, the automobile, and electrification, there just aren’t any examples of new AI-first companies that look like they’ll grow that big.

OK, in the 80s it took a long time for the ‘computer age’ to show up in the financial statistics and maybe we’re at the same place. Still, a bunch of people, especially VCs are wondering how to grow an AI company all the way to IPO. This is just now beginning to lead to several different visions of what a successful AI strategy should look like.

Recently we wrote about our own favorite strategy, Data Dominance. There are two or three others you should be aware of and we’ll talk here about two leading strategies, horizontal and vertical.

How Do We Know There’s a Problem?

In 2017 CB Insights reports that of the 120 AI companies that exited the market, 115 did so by acquisition. And the majority of those acquisitions were made by just 9 companies. Guess who.

AI startups are only a few years old and it’s fair to say that when those founders started out they hoped not only to change the world but to make some life-changing money. That traditionally means IPO.

But thanks to the shortage of AI talent, most of these startups ended up being nothing more than acquihires. We hope that at least some made that life-changing money but all those guppies ended up swallowed by whales and are now just features or products, not world changing businesses.

Not all startups will make it. And it certainly requires some soul searching when Google or Amazon come calling. But there’s an open question about whether the strategies of these startups was viable, and that’s where our conversation begins.

Horizontal Strategy

The horizontal strategy is where our industry started, almost by accident. The core concept is to make an AI product or platform that can be used by many industries to solve problems more efficiently than we could before AI.

Then came what VCs refer to as the monoliths: Google, Amazon, IBM, and Microsoft. As we now know, their dominance in their respective areas of data gave them an almost insurmountable lead in offering generalized image, video, speech, and text AI tools using the familiar MLaaS model.

None of these companies started out to be AI-first companies. This grew out of the phenomenon that was once called the data-sidecar strategy. It was there, so why not add it to their offerings.

The monoliths have become so dominant that there’s a widely held principle among VCs that startups should have a maximum distance from these core competencies in order to be defensible.

There’s a second tranche of horizontal competitors below the monoliths including startups. These reach all the way down to some of the newer automated machine learning (AML) companies like OneClick.AI who integrates deep learning with the standard assortment of ML algos.

Horizontal Strategy is Out

The bottom line is that the horizontal strategy is out for a variety of reasons.

Thanks to the open source ethos of AI there’s really no defensible IP in any ‘proprietary’ DL algorithms that can’t be copied by a fast follower.
Horizontal companies don’t own the customer’s core problem. They simply provide a tool that must either be adapted by consultants familiar with that industry, or requires the client customer to learn more about DL than they probably wish to.
They don’t own the training data unique to the customer’s problem so there’s no data defensibility.
These tools tend to be incrementally better than their traditional MLaaS counterparts, but not break through better. This includes direct competition from the monoliths.

In short, this is not where VCs are putting their money.

Vertical Strategy

The vertical strategy isn’t the only alternative but it’s certainly the most widely talked about these days. It shares the concern over data dominance with that strategy but goes further in specifying other aspects required for success.

The most vocal advocate for vertical is Bradford Cross, a founding partner at Data Collective DCVC, self-described as the world’s leading machine learning and big data venture capital fund. I don’t know if Bradford is the inventor of the vertical strategy but he can certainly claim to have published more in detail than others.

The vertical strategy has four primary principles:

Full Stack Products: Provide a full-stack fully-integrated solution to the end customer that solves a true ‘hair on fire’ problem. Full stack means from interface to the DL models to the data that drives the models, and all the functionality in between.
Subject Matter Expertise: Pick an industry and focus. This requires deep subject matter expertise beyond deep learning. This means bringing in industry leaders early in the process which greatly facilitates not only defining and addressing the problem, but also addresses trust and relationships within the industry when it’s time to go to market.
Proprietary Data: Owning the interface allows you to instrument it and gather proprietary data. Then you are able to build high value models that drive the acquisition of additional data in that virtuous cycle of customer – application – data. You control the data value chain giving you both data dominance and pricing power. In most instances, determining how to acquire the initial data will be the most difficult aspect of this strategy.
AI Must Deliver the Core Value: AI is not an incremental add to the solution, it is the core to unlocking a totally new opportunity. AI plus proprietary data gathered with the product itself should allow you to build increasingly attractive and valuable solutions for the industry.

If you’d like to read more from Bradford on the Vertical Strategy try this excellent article.

Is That All There Is to the Vertical Strategy?

Well no. For starters, as we pointed out in our Data Dominance article, picking the right industry with the right problems suitable for AI is no small challenge.

Although any industry that meets the criteria for the vertical strategy might be ripe for defensible exploitation, Bradford Cross adds some insight unique to the VC world.

First of all, Mr. Cross quotes that 90% of AI startups are focused on enterprise markets, not consumer apps that have been so common up to this point.

“Compared with consumer startups since 1995, enterprise startups have returned 40% more capital overall. Enterprise and consumer startups have generated equivalent IPO value, but enterprise has generated 2.5X the M&A value.”

Second, market exits come in cohorts. Interest in ML and AI in the M&A market tends to occur in groupings around hot industries in which a number of viable startup targets arise together. Right now that tends to favor fintech and healthcare, followed perhaps by energy, utilities, basic industry, and transportation.

The point is simply that there are going to be more buyers in the market if you are a member of a cohort focusing on a hot industry.

Does this mean you should skip opportunities in other industries? Not at all. Just be aware that VCs are more likely to be generous with funding in favored industries than in outliers.

An Observation about Full-Stack Solutions

In general I agree with the desirability of a full-stack solution, one that provides an easy interface and solves a really important customer problem. It’s worth backing up a minute however to consider how much full-stack is enough full-stack.

Using predictive maintenance in IoT as an example, a recent BCG report listed all of the following as part of the total solution for full-stack predictive maintenance solution.

Identifying at-risk component failure prediction.
Optimizing resource scheduling and staffing.
Matching technician and Inventory to the maintenance and repair work to be done.
Ensuring tools and repair equipment availability.
Ensuring first-time-fix optimization.
Optimizing parts and MRO inventory.
Predicting component fixability.
Optimizing the logistics of parts, tools and technicians.
Leveraging cohorts analysis to improve service and repair predictability.
Leveraging event association analysis to determine how weather, economic and special events impact device and machine maintenance and repair needs.

As you can see, providing this full-stack solution is a tall order. Pretty much on the scale of inventing a SAP or PeopleSoft integrated ERP solution. Would this be defensible – you bet. Is this necessary in the original vision of your AI-first business solution – probably not.

As a longer-term goal this would be an ideal case but you can probably build a significant business on just 3 or 4 of these elements.

Looking back at the examples from our previous article, Blue River Technology, Axon, and Stitch Fix, these three are well on their way to defensible businesses using data dominance as their core goal. Will they grow to full-stack at some point – perhaps, but the solutions they currently offer are very high value.

What they share with the vertical strategy beyond data dominance is subject matter expertise and AI-first value creation. Full stack, however you wish to interpret that can follow so long as the application solves the immediate customer problem easily, efficiently, and accurately.

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/comparing-ai-strategies-vertical-vs-horizontal.

AI with Pyramids of Self Programmable Gates

Posted on August 7th, 2018

Posted by Vincent Granville on May 2, 2018 at 6:30am

Guest blg post by David Enríquez Arriano. For more information or to get higher pictures resolution, contact the author (see contact information at the bottom of this article.)

Introduction

This is a different approach to solve the AI problem. It is a cognitive math based on pyramids built with self-programming logic gates through learning.

A Boolean polynomial associated with a given truth table can be implemented with electronic logic gates. These circuits have pyramidal structures. Then I built pyramids accomplishing the generic form for any of these problems.

Although I can choose the balance between pure logic and pure memory in which they operate, in general, always I prefer to use the maximum cognitive power mathematically possible.

The result is an algorithmic that makes you feel as teacher in front of another human infinitely intelligent who learns looking for the logic that might exist in the patterns (input, output) fed in training.

This cognitive math allows continuous learning, immediate adaptation to new tasks and focus on target concepts. It also allows us to choose the degree of plasticity, also to implement control and supervision systems, although all of this it is fully self-regulated and self-scalable automatically if we desire so.

It is an absolutely simple and fundamental algorithm. This algorithmic extends the forties foundations of modern computing to its maximum possible.

At this level of pyramids everything is more crystallographic or mineral than biological. I use several of these pyramids and a few more pieces to build an artificial neuron. But the power of these pyramids is so great that for now I have not needed to build neurons and much less networks of them, although I know perfectly how to do it, why and for what it would be appropriate to take that step.

Experimental example of crystallographic evolution of cognitive structure in the pyramid; here pyramids pointing down:

One simple example in EXCEL, here pyramids pointing down:

This algorithmic allows or nesting or embedding cognitive structures already learned in other new majors. I have detected possibilities of certain recombination of structures to generate others, but it is something that I have yet to explore in more depth.

This algorithm works in binary and with two-dimensional pyramids because I have proven that it is the way to achieve the greatest possible cognitive power, although it can be operated in any other base and dimension at the cost of losing cognitive power.

Here is an example of one layer with four inputs binary gates in 3D square based pyramids that allows implementing them without having to use the corresponding 4D tetrahedrons. Four of these gates will feed one gate on next layer:

Here is another example of two layers with three inputs binary gates in 3D triangular based pyramids that allows implementing them building the corresponding 3D tetrahedrons. Again, here three of these gates in one layer will feed one gate in the next layer:

But I repeat, in binary and with two-dimensional pyramids the efficiency is the best, and it is so enormous that it can be computed on any mobile device, although a World Wide Web online service can easily be offered by keeping the algorithms secret on one’s own servers.

The transmission of the cognitive structure is also enormously effective in terms of the amount of information that needs to be transmitted: simple short strings of characters.

In addition everything is encrypted in itself by definition of the algorithm, because the pyramids only see zeros and ones in their inputs and give outputs following their learned internal logic, but they do not necessarily have to know what they refer to. Actually, only the user, may be a robot, will use and know the data meaning.

This math establishes a distance metric in an n-dimensional binary space that allows any learning to be optimized automatically. In addition we will use progressive deepening in the cognitive learning but always on the entire incomplete data space which configures a landscape of data, a physical map in which the best teacher guides from general cognition to concrete one, deepening progressively.

Basic graph for the generalization of contiguity metric on patterns distance for dimensions upper to three:

First order transitions table for the fundamental and simplest binary gate with two inputs.

Graph of all the possible state transitions bit to bit on the simplest gate:

Actually these 16 states on this 2D circle are the vertices of 4D hypercube on the binary B4 hyperspace. To configure the data physical map, we humans mark the importance of data mainly with life emotions. We can use some more pyramids to program the same emotional response and imprint it on any of these pyramids structure, or choose any other criteria to do so.

All of this is already finished and experienced. Examples shown in this blog are built in a bare spreadsheet, but for large-scale implementation I choose:

This cognitive math allows quickly evaluating and correcting all current AI techniques. It is really handy for any researcher or AI developer.

STEPS 1 and 2 do not require more than 2 to 4 hours/person of programming and debugging. STEP 3 can take up to 1 month/person programming depending on the use you want for this technology.

This cognitive math is already finished, so no further investment is necessary in this regard. May be you or your company would like to plan other future actions to further explore the enormous implications and their application in all other areas of knowledge. If so, please, don’t hesitate to contact me. I have contrasted this technology, but it will not be completely published.

Multidisciplinar Math

I have been decades reviewing AI works everywhere; as usual, it is not casual that I covered so many areas in so many disciplines.

Now, I can tell some main keys: When you feed a new pattern to these pyramids, mathematically far away from the clouds of previous patterns, the logic previously learned normally exploits; visually, logic crystals explode like supernovas. This also happens in humans when confronting anything far from the previous knowledge. But my pyramids always preserve intact as much as possible of the pre-learned logic even on this “catastrophic” situation. At first I thought this was a big problem, an error, then, mathematically, I understood it has to be this way. The math itself was guiding me. Only a good teacher can avoid this catastrophic event using progressive learning. The metric defined on this Bn space math allows such progression.

To create all of this on AI, I went through many walls of misunderstanding like that supernova like explosion. But again, I learned to be taught by this new cognitive math.

In sciencie many times we don’t pay attention to those points of information very distance from the regular cloud, but it happens that those distant points usually appear to be those with new relevant information.

Multilevel Programming, Parallel Computation and Embedded Nesting

This cognitive math allows to implement “what happens if” test, or parallel self-supervision.

But as in human training, you must prepare in advance of the emergency, because when the problem appears, normally there is no time to think or to learn; if lucky, may be you will have that time later. On this regard, I’m sorry to tell you that my cognitive math is so human, but mathematically it tells me that this is the way it has to be. If not, it doesn’t work.

Anyway, these machines don’t get tired, learn faster than humans and we can easily and automatically clone the best.

The origin of my cognitive math is this: when in a chaotic system energized you implement a law of behaviour to the agents, then order appears.

Complex structures became agents for higher structures. It is fractal. The four DGTI principles that I enumerate for true AI, are always the same at any level:

To protect and empower Diversity.
Group target always has priority over individual one.
Transmit and live this four principles to all agents.
There is always an Intelligent solution to any conflict, you only need to increase perspective, vision.

The difficult part with this math is to accept the universal reality of those four principles.

One example: this technology allows putting an AI pilot learning next to any human commercial pilot in the flight cabin. Then we’ll examine those AI pilots on simulator, chose the best, file the others, and improve the learning process with clones of the best in any fly. Every AI pilot collect experience and learn, but we will choose the best exactly as we do with humans. Of course for a complex task like flying a commercial plane, or driving a car, you need a system with many modules interacting in parallel, exactly as human brain has. My math mismatches all of this.

We can always implement some big pyramids to do any whole task, but adaptation capability to optimize any solution, takes us from almost crystallographic pyramids to something much more biological: neuronals, nets, nodules… We can always implement some big pyramids to do any whole task, but adaptation capability to optimize any solution, takes us from almost crystallographic pyramids to something much more biological: neuronals, nets, nodules… But now, here we will build neuronal processing machinery based on pyramids.

After studying cellular computation layers: ADN, epigenetic, ARN, protein folding, membrane computation, intercellular communication… these pyramids cognitive math foundations could be working as at micro tubular level on living neuron body and axon, where computing logic storage seems to holds on hydrophilic and hydrophobic molecules attached to alpha and beta dimmer tubulin proteins conditioning their resonant states.

I am very sorry to say that with public state of art on neural nets, working with weights, filters and retro-propagation feedback, I feel far from building even one single artificial neuron; a lot less a net like living ones.

But to solve the actual problems on AI, there is another way with these pyramids, when there is little time to think and you need then on ongoing long term live learning as in humans. In fact our own brain does so: we can nest pyramids embedding then inside pyramids, augmenting the cognitive structure to encompass new unknown patterns. This is derived from the IV DGTI principle I found: To increase cognitive structure to avoid conflict.

We can also use my cognitive math with any other system; applying this complete new concepts bunch

I like to recall how Alan Turing succeeded on Enigma decoding when he realized that, for learning, the machine need to know if the answer given is correct or not. In real live we, and also machines, my pyramids included, only can do that testing through experience. The good thing: when you have a lot of good experience, good training, the ideas or answers than you give to any problem will tend to be better, but still you will need to test then on real live to be sure. My pyramids do so.

We can easily program my pyramids to decrease “weight” of non used logic structures through time, allowing more probable change on them when confronting new patterns in live. We can perfectly adjust that cognitive lose event in many ways or even automate a parallel control of this behaviour. My pyramids have cognitive memory and memory of the strong of those memories during learning.

On training time, as when humans dream, if something not really important is not learned, I mean important depending on interest goal and/or emotions-trauma, then the program erase it from the list of knew patterns to learn. But if it is something important, then the system is forced to add it to the previous cognitive structure. It is here where patterns far from previous knowledge can create “trauma” which only exit is the typical catastrophic even, when almost all previous knowledge is destroyed on the supernova-like explosion event. Of course, emotions, if needed at any percentage, are only another program running on parallel.

We must be careful don’t mixing fundamental cognitive computing concepts with problems or concepts regarding higher cognitive structures.

Try to calculate a quick estimation of how many logic cognitive combinations has one of my pyramids built with my logic gates, every one with 16 possible estates. Any single gate, stone of the pyramid, can be one of the 16 basic gates: AND, OR, XOR, NAND, NOR, NXOR … any of these self programming gates transiting bit by bit through learning among those 16 possible states…

When any gate need to answer and have no previous knowledge, then randomly tries 0 or 1. This solves the initial value problem, and values 0 or 1 have the same significance depending of the place and local case. But even that these pyramids work like super–Touring machines, of course, if we use exactly the same given list of those random 0 or 1, then the machine is completely replicable following always the very same path through learning as a Turing machine does. But we are lucky: random is true random, not a given list, when we need to ask for those random 0 or 1 in the program.

This math is multidisciplinary; it is a General Systems Theory. Knowing this new math implies a new state of awareness on everything.

A very important key regarding the plasticity: It is necessary to allow changes in the learned cognition to add new knowledge, even destroying almost everything when the new knowledge is far (mathematical distance) from de cloud of previous patterns. But a clone can do that process in parallel before taking the place of the previous one. This is not debility; it is the only way it works as in human brain. In humans you need to go to sleep and dream to try to add properly those new patterns, but machines using those clones don’t need to stop.

So, I don’t put my pyramids to sleep/learn experiences, I can do that with their clones.

Cognition versus Memory

My cognitive math shows how to implement pure memory or pure logic, also choosing the point of balance: pure memory versus pure logic. But I normally prefer pure logic.

Pure memory only storage the exit given to an specific input, but with no internal operational logic structure at all that relates to the logic of some other patterns. Opposite to this, my cognitive math creates that internal operative logic from the very first pattern feeded.

My pyramids only see and give zeros and ones. The meaning of those binary vectors IN and OUT doesn’t matter to my algorithm, and despite this, my pyramids always look for, and find, some internal logic in the training patterns. And, the better is the teacher then better is the cognitive logic in the pyramids.

As always asked in so many places for AI, and for many decades ago, everything needed is already implicit in this cognitive math:

Continual learning
Adaptation to new tasks and circumstances
Goal-driven perception, context-mission
Selective plasticity
Safety and monitoring.

For this last point, safety and monitoring, I can implement surveillance pyramids automatically trained for this task running “what if” tests, but as with humans, it is always preferred to improve the supervised pyramid through training a clone when time is not a problem. Principles like follow orders from specific human must be included.

We can train specific AI personalities and behaviours when needed.

We can use what already have: vision and speech recognition; implementing with my technology the AI brain that use those capabilities. Or let my cognitive math develop those capabilities itself, anywhere at any level needed.

For further advance, I build neuronals and put then to live in a virtual membrane with valleys were more information moves so that tactism on neuronals looking for activity guides them. Obviously I pre-wire an initial structure learned/cloned from previous tests. Every neuronal circuit, and every neuronal is connected, all with all, through another membrane of transmission were waves of activity connect them all, EEG like. All of this next level is much more biological.

My pyramids learn by changing their cognitive logic. With this same technology we can recreate natural chemical neurotransmitter effects to modulate behaviour, if needed we can change or modulate the learning rules. We can also automate this modulation as we do in humans through training/education.

My math teach me that to reach higher social evolution, at some point we have to be competent collaborative instead of competent predators. I love brainstorming to cross ideas with any other groups.

Responsability

This new cognitive math works. It is pure logic, pure math. I have put to work enough models and demos. Sincerely, I think it is irresponsible to build this in any environment without the proper control of human and material resources. Who could provide such resources on this planet? At this point, I am pretty sure that everybody perfectly knows and understands the final implications of this technology.

It Works Itself

As in humans, my AI algorithm is capable of give adequate answer to patterns previously unknown, and to do it properly well if the previous training has been correct –good teachers-, exactly just like humans. But with machines, we can quickly clone and put to work the best. My cognitive math allows mathematical optimization of training. If we desire so, everything always can be made without human intervention at any moment.

It is self-scalable. It allows choosing the proper balance cognition versus memory. It uses mathematically minimum resources. Even though it can be run on any device world wide, my technology allows to preserve secret AI algorithm safe at servers.

It is pure logic, taking Boolean logic from the 40´ to the quantum level on any personal device.

Appendix to go deeper

THE TRAVEL OF LEARNING, JOURNEY OF KNOWLEDGE

We walk on the shoulders of giants. Great men who connected the information points that surround us drawing wonderful conclusions that today allow us to live as we live and, sometimes, even create new connections, new knowledge, new cognition, the real information.But perhaps these giants passed some connection, some bifurcation in the path of knowledge, some unexplored branch whose ramifications we can not find following the paths already marked.Are we capable of daring to come down from their distinguished shoulders? Do we dare to put our feet in the sand where they step and look under all those stones that no one raises? Those stones that pave the path of modern knowledge that we all take for granted. Stones, knowledge, that perhaps hide under fringes, connections, cognition, branches not yet explored.Do we dare to look at the edge of the road that we all know, and try to open other completely new paths by walking where nobody has done before? In the following lines, we are not just going to make such a trip. We will leave the highway of common knowledge, comfortable and well-behaved, that comfortably travels the valley advancing slowly and surely as the new cognitions in sight clear the way. We here abandon it, and cross country, we will climb to the top of one of the mountains that surround us, and from this vantage point, we will dare to break a small hole in the veil that often clouds our global vision. We will look through this orifice around glimpsing the many other peaks and valleys that surround us. Unknown peaks, valleys not yet explored, not even dreamed of, in any area of knowledge. With this augmented vision, this greater perspective, we will descend again to the valley, but no longer on the path by which we climb to the top. In the valley, with the new perspective acquired, we will see how the knowledge highway that we leave is still very far from the place we have reached. Then we are back to the valley, but now in the middle of the wild forest, in which there are still no roads, nor paths, nor giants on which to feel safe. And we have found great knowledge, but now we are alone and we have to find a way to advance the highway to where we are to tell everyone about the other peaks and cognitive valleys we have seen from the watchtower.

THE JOURNEY We have come down from the shoulders of the giants. We have our bare feet in the sand. We look under any of those stones that they have stepped on so many times, with us on top of them: Boole’s algebra. We form truth tables of four lines and three columns. The first two columns contain the four possible combinations of the two binary variables on which we perform a logical operation. In the third column we fill in the lines defining the type of logical operation. AND, OR, NOR, XOR… With only AND and OR and NO, we have built all modern computing. These fundamental operations are the root of all what a microprocessor knows to do, with this we do everything else going up in levels of complexity, nesting in one another. Now, we are going to expand this foundation, with the stone in our hands we look out of the way, because there are 16 logical operations or possible doors. Yes 16, and we need them all to built proper AI. Given any case of binary inputs that we feed to the “black box” that must give us an specific binary outputs, we define a truth table for that black box. We assign a column to each input variable, for each output variable we add another column to the truth table. Each line of this table represents a combination of the inputs and their corresponding output that must give us the black box that we program. Each line is a pattern: input and its corresponding output. If we do not know all the possible patterns, it can happen that at some point our black box hangs because it has faced an entry not registered in its table, an entry for which it has no output recorded in its table. Boole tells us how to write and reduce algebraic polynomials capable of operating with the inputs to give the outputs. These polynomials are a way to operate or program the black box, another way is by simple memory in the table.

Boole also tells us that to write the polynomial, we can look at the ones, doing an AND of the ones of the entries in each line of the table, and with these an OR in the ones of the column of each output. Thus we will have a Boolean polynomial for each output column. When we implement these polynomials with electronic logic gates, pyramidal structures usually appear for each output variable. A pyramid for each output variable, but all pyramids with the same input variables in its base. These pyramids usually have several logic gates in the area of their base. But the number of gates is reduced as we go through the processing of the polynomial, until we reach a single gate at the exit, at the top of the pyramid. It is logical, never better said, the base of the pyramid processes information closer to the input data, information more specific to them. But by delving into the logical processing to the output, the information is increasingly general taking into account factors of the broader inputs. Experimentally pyramids usually show, in their cognitive logic, a characteristic crystallography near their base, associable to the primary decoding of the input vector data. Can we build a generic pyramid for any given truth table? We can place the stones from the base to the top covering with each the joint of the two one in the line below. Stones underneath give input information to the stone on the superior line. We should allow these stones to be any of the 16 possible logical gates. And we’ll want to somehow self-programming through learning; we’ll care this. At the base of the pyramid, inputs of each gate or stone must have all the possible combinations of all inputs, because we want the pyramid to be generic and therefore it must always have all the possibilities of relation among all the inputs. For this we can make a two-dimensional double-entry table, with the entries in the rows and in the columns. From the matrix of possible pairs, the diagonal does not give us anything by relating each input variable with itself. We can take the combinations only of the upper triangular matrix, because they are the same as those of the lower triangular.

These combinations are the ones we use at the input base of the pyramid. For example, having 4 entries a, b, c, d; we will have the input gates of the base of the pyramid fed with:

So this pyramid will have 6 gates in its base:

(4-1) + 2 + 1 = 6

In general, given n input variables, the base of the pyramid will have:

(n2 – n) / 2 gates

This number is also the number of lines of the pyramid to the exit at the top. Obviously we are working in binary and therefore with two-dimensional pyramids. We can generalize all of this in other bases, or with gates of more than two inputs, and with multidimensional pyramids. But the greatest possible connectivity and therefore the highest logical power is achieved in binary, with two-way gates and two-dimensional pyramids. However, the multidimensional exploration leads to a beautiful conclusion that relates the infinite and zero, and the balance between pure cognition and pure memory. We move in a space that I call Bn, in which each new binary variable adds a new dimension with a single point that can be zero or one. These concepts are important to get into the universe of incomplete sample spaces, in which pyramids are the program inside the black box, which will learn through experience from incomplete sets of patterns.

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/ai-with-pyramids-of-self-programmable-gates.

Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics

Posted on August 6th, 2018

Posted by Vincent Granville on January 2, 2017 at 8:30pm

In this article, I clarify the various roles of the data scientist, and how data science compares and overlaps with related fields such as machine learning, deep learning, AI, statistics, IoT, operations research, and applied mathematics. As data science is a broad discipline, I start by describing the different types of data scientists that one may encounter in any business setting: you might even discover that you are a data scientist yourself, without knowing it. As in any scientific discipline, data scientists may borrow techniques from related disciplines, though we have developed our own arsenal, especially techniques and algorithms to handle very large unstructured data sets in automated ways, even without human interactions, to perform transactions in real-time or to make predictions.

1. Different Types of Data Scientists

Recently (August 2016) Ajit Jaokar discussed Type A (Analytics) versus Type B (Builder) data scientist:

The Type A Data Scientist can code well enough to work with data but is not necessarily an expert. The Type A data scientist may be an expert in experimental design, forecasting, modelling, statistical inference, or other things typically taught in statistics departments. Generally speaking though, the work product of a data scientist is not “p-values and confidence intervals” as academic statistics sometimes seems to suggest (and as it sometimes is for traditional statisticians working in the pharmaceutical industry, for example). At Google, Type A Data Scientists are known variously as Statistician, Quantitative Analyst, Decision Support Engineering Analyst, or Data Scientist, and probably a few more.

Type B Data Scientist: The B is for Building. Type B Data Scientists share some statistical background with Type A, but they are also very strong coders and may be trained software engineers. The Type B Data Scientist is mainly interested in using data “in production.” They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results).

I also wrote about the ABCD’s of business processes optimization where D stands for data science, C for computer science, B for business science, and A for analytics science. Data science may or may not involve coding or mathematical practice, as you can read in my article on low-level versus high-level data science. In a startup, data scientists generally wear several hats, such as executive, data miner, data engineer or architect, researcher, statistician, modeler (as in predictive modeling) or developer.

While the data scientist is generally portrayed as a coder experienced in R, Python, SQL, Hadoop and statistics, this is just the tip of the iceberg, made popular by data camps focusing on teaching some elements of data science. But just like a lab technician can call herself a physicist, the real physicist is much more than that, and her domains of expertise are varied: astronomy, mathematical physics, nuclear physics (which is borderline chemistry), mechanics, electrical engineering, signal processing (also a sub-field of data science) and many more. The same can be said about data scientists: fields are as varied as bioinformatics, information technology, simulations and quality control, computational finance, epidemiology, industrial engineering, and even number theory.

In my case, over the last 10 years, I specialized in machine-to-machine and device-to-device communications, developing systems to automatically process large data sets, to perform automated transactions: for instance, purchasing Internet traffic or automatically generating content. It implies developing algorithms that work with unstructured data, and it is at the intersection of AI (artificial intelligence,) IoT (Internet of things,) and data science. This is referred to as deep data science. It is relatively math-free, and it involves relatively little coding (mostly API’s), but it is quite data-intensive (including building data systems) and based on brand new statistical technology designed specifically for this context.

Prior to that, I worked on credit card fraud detection in real time. Earlier in my career (circa 1990) I worked on image remote sensing technology, among other things to identify patterns (or shapes or features, for instance lakes) in satellite images and to perform image segmentation: at that time my research was labeled as computational statistics, but the people doing the exact same thing in the computer science department next door in my home university, called their research artificial intelligence. Today, it would be called data science or artificial intelligence, the sub-domains being signal processing, computer vision or IoT.

Also, data scientists can be found anywhere in the lifecycle of data science projects, at the data gathering stage, or the data exploratory stage, all the way up to statistical modeling and maintaining existing systems.

2. Machine Learning versus Deep Learning

Before digging deeper into the link between data science and machine learning, let’s briefly discuss machine learning and deep learning. Machine learning is a set of algorithms that train on a data set to make predictions or take actions in order to optimize some systems. For instance, supervised classification algorithms are used to classify potential clients into good or bad prospects, for loan purposes, based on historical data. The techniques involved, for a given task (e.g. supervised clustering), are varied: naive Bayes, SVM, neural nets, ensembles, association rules, decision trees, logistic regression, or a combination of many. For a detailed list of algorithms, click here. For a list of machine learning problems, click here.

All of this is a subset of data science. When these algorithms are automated, as in automated piloting or driver-less cars, it is called AI, and more specifically, deep learning. Click here for another article comparing machine learning with deep learning. If the data collected comes from sensors and if it is transmitted via the Internet, then it is machine learning or data science or deep learning applied to IoT.

Some people have a different definition for deep learning. They consider deep learning as neural networks (a machine learning technique) with a deeper layer. The question was asked on Quora recently, and below is a more detailed explanation (source: Quora)

AI (Artificial intelligence) is a subfield of computer science, that was created in the 1960s, and it was (is) concerned with solving tasks that are easy for humans, but hard for computers. In particular, a so-called Strong AI would be a system that can do anything a human can (perhaps without purely physical things). This is fairly generic, and includes all kinds of tasks, such as planning, moving around in the world, recognizing objects and sounds, speaking, translating, performing social or business transactions, creative work (making art or poetry), etc.

NLP (Natural language processing) is simply the part of AI that has to do with language (usually written).

Machine learning is concerned with one aspect of this: given some AI problem that can be described in discrete terms (e.g. out of a particular set of actions, which one is the right one), and given a lot of information about the world, figure out what is the “correct” action, without having the programmer program it in. Typically some outside process is needed to judge whether the action was correct or not. In mathematical terms, it’s a function: you feed in some input, and you want it to to produce the right output, so the whole problem is simply to build a model of this mathematical function in some automatic way. To draw a distinction with AI, if I can write a very clever program that has human-like behavior, it can be AI, but unless its parameters are automatically learned from data, it’s not machine learning.

Deep learning is one kind of machine learning that’s very popular now. It involves a particular kind of mathematical model that can be thought of as a composition of simple blocks (function composition) of a certain type, and where some of these blocks can be adjusted to better predict the final outcome.

What is the difference between machine learning and statistics?

This article tries to answer the question. The author writes that statistics is machine learning with confidence intervals for the quantities being predicted or estimated. I tend to disagree, as I have built engineer-friendly confidence intervals that don’t require any mathematical or statistical knowledge.

3. Data Science versus Machine Learning

Machine learning and statistics are part of data science. The word learning in machine learning means that the algorithms depend on some data, used as a training set, to fine-tune some model or algorithm parameters. This encompasses many techniques such as regression, naive Bayes or supervised clustering. But not all techniques fit in this category. For instance, unsupervised clustering – a statistical and data science technique – aims at detecting clusters and cluster structures without any a-priori knowledge or training set to help the classification algorithm. A human being is needed to label the clusters found. Some techniques are hybrid, such as semi-supervised classification. Some pattern detection or density estimation techniques fit in this category.

Data science is much more than machine learning though. Data, in data science, may or may not come from a machine or mechanical process (survey data could be manually collected, clinical trials involve a specific type of small data) and it might have nothing to do with learning as I have just discussed. But the main difference is the fact that data science covers the whole spectrum of data processing, not just the algorithmic or statistical aspects. In particular, data science also covers

data integration
distributed architecture
automating machine learning
data visualization
dashboards and BI
data engineering
deployment in production mode
automated, data-driven decisions

Of course, in many organisations, data scientists focus on only one part of this process.

Content retrieved from: https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning.

How to Set Up Shopify Ecommerce Platform

Posted on August 3rd, 2018

Not without the reason, Shopify (our full Shopify review here) is one of the most popular online store solutions out there. It’s main benefit? For a very affordable price, it lets you build a functional online store all by yourself. And not just any online store … what I’m talking about here is a truly quality result. Something that looks and works just like a pro would have built it.

So, in this tutorial, we’re going to discuss how to set up your first eCommerce store on Shopify. We’re covering the entire process, from a blank canvas to a fully functional eCommerce store ready to welcome your customers.

(This guide – “How to set up your eCommerce store on Shopify” – has been written with the beginner in mind, no coding or website building skills needed.)

But first…

What is Shopify?

Shopify is a complete, all-in-one eCommerce solution. Once you sign up for an account with Shopify, it allows you to:

Build your online store all by yourself.
Pick a design that suits your needs.
Pick a unique name and domain for your store.
Add your products and their details (price, description, etc.).
Process orders from your customers.
Handle online payments through solutions like PayPal and others.
Run special promotions, discounts, and sales.
And much more.

At the same time, Shopify is a way more affordable choice than hiring someone to build the store for you. Not to mention that it’s also more reliable over the long haul (the software itself is constantly upgraded, and there’s great customer support available).

To learn more about the ins and outs of Shopify, and find out whether or not it’s the perfect solution for your specific needs, feel free to visit our review. There, we point out all the pros and cons of Shopify and discuss the most important characteristics of the platform.

Table of Contents

1 What is Shopify?
2 How to get started with Shopify
3 How to set up your first online store with Shopify
4 Picking a design for your Shopify eCommerce store
5 Setting up page content in your store
6 Important things not to forget
- 6.1 Selecting a domain on Shopify
- 6.2 Enabling online payments within Shopify
7 How to make your Shopify store public?

How to get started with Shopify

Although this might seem hard to believe at first, in order to get started and set up your eCommerce store with Shopify, all you need is an hour of your time.

First, go to shopify.com and click the “Get started” button:

After that, you’ll be taken to the sign-up form. Just a classic thing … you’ll need your email, password, and a name for your store. I went with “Hats R Great”:

As you can see, there’s a free 14-day trial here, so you don’t need to spend any money to test things out and see if Shopify really fits you and your eCommerce store.

The next step is an important one from a legal point of view. Basically, if you want to operate like an actual store, you need to provide some details about your business.

Just a common web form:

After you click “Next,” Shopify will ask you about a handful of additional business details. Based on what you select, Shopify will try to optimize your experience going forward:

At this stage, it’s time to start setting up the parameters of your new eCommerce store on Shopify.

How to set up your first online store with Shopify

Let’s start with looking through the main dashboard of Shopify:

(1) The homepage of your dashboard. It’s where you can get to the details of what’s going on with your Shopify store.
(2) It’s where your first orders will show up, and where you’ll be able to manage every incoming order.
(3) It’s where you can add and manage your products, inventory, product collections, gift cards.
(4) It’s where you can manage your customers and every piece of info you have about them. This includes their past orders, their personal details, their stats (e.g. sales volume), and much more.
(5) It’s where you can generate all kinds of reports about your store’s performance.
(6) It’s where you can create discount codes and distribute them to people.
(7) It’s where you can install new apps and extend Shopify with new functionality. Really cool stuff. For instance, you can install an app for SEO, for email marketing, and so on. Particularly useful for advanced users.
(8) This is where all settings related to the store can be found. This includes things like your personal info, payments settings, checkout settings, taxes, shipping, and more.
(9) The main screen of each panel.

Let’s stop on that last item for a minute. When you visit the dashboard for the first time, Shopify will present you a list of actions that you can take to fully customize and launch your store to the public. Let’s do that.

In the case of my new store, here’s what Shopify tells me to do:

From the first drop-down, I’m going to select “Fashion and Apparel.”
Then, I’m going to add my first product. Just click on the “Add a product” button that you can see above. Shopify will then take me to the “Add product” page, where I’m able to complete the process:

I’m calling my first product “French Bulldog Hat.” I’ll add some description, plus a picture of the hat.

The nice thing here is that adding images works through drag-and-drop. So just take an image from your desktop, and drag it onto the section marked “Drop files to upload.” Like so:

At this point, you can set legal parameters such as taxes, shipping variants, and also set your inventory (but that’s all optional).

When you’re done, just click on the “Save product” button.

If you go to your products list right now, you’ll see the first product just waiting there to be bought!

Okay, let’s come back to the dashboard now and take care of the remaining settings of our eCommerce store on Shopify (the “Home” link in the sidebar).

At this point, we have an option to:

From those four, the most useful option is going to be “Create an online store.”

The remaining three are about, respectively, selling products in person in your actual brick-and-mortar store (Shopify can help with that too), selling directly via Facebook, adding stand-alone single products to your website outside of Shopify (this means that you can, for instance, add a buy button to your WordPress blog).

I’m just going to click on “Add Online Store.”

Picking a design for your Shopify eCommerce store

This is what you’ve been waiting for … it’s time to pick a design for your new eCommerce store:

Let’s start by clicking the “Select a free theme” button.

Shopify offers a load of great-looking free designs. All you have to do is just pick one. No design skills needed at all to launch a truly good-looking eCommerce store on Shopify.

Okay, so browsing through what’s available:

… I think I’ll go with the one simply called … “Simple”:

(Note. Every design offered by Shopify is optimized to be viewed on desktop, mobile and tablets. Also, when selecting your design, please don’t worry about the kind of products that are showcased in the theme’s demo. Those are just examples. After you launch your store, you can sell whatever you wish through any of the themes.)

At this point, let’s just give the “Publish theme” button a click, and your design will be set.

In the next step, Shopify will present you an example of what your store looks like, along with its appearance on mobile.

If you want to, you can click on the “Customize theme” button to adjust the design you’ve selected. There’s a couple of handy options here, and I invite you to experiment with them on your own, but let’s just point out some basic details. Here’s the customization screen:

(1) The main view of your eCommerce store on Shopify – this is what it currently looks like.
(2) The device switcher – see what your store looks like on desktop, tablets, and mobile.
(3) The options panel – this is where you do your adjustments.
(4) The saving panel – save or cancel your changes there.

Right now is a good moment for you to go through the options panel, section by section and experiment with what’s there.

For example:

Go into “Presets” – see if you like any of the pre-made variants of the theme’s design.
Go into “Colors” – set the colors for your text, icons, headings, and etc.
Go into “Typography” – change the default fonts (it’s the easiest way to give your store a unique look).
Go into “Header” – set the logo that’s going to appear on your Shopify store. Like so:

Go into “Home page” – choose a banner image for your store. Like so:

Go into “Social media” – link your Shopify store with your social accounts.

When you’re done having fun there, just click the “Back to Themes page” link:

Setting up page content in your store

So, with the products and the design handled, let’s now include some texts onto your pages. Particularly, the homepage. For that, let’s go to Online Store / Pages (from the left sidebar):

Once there, click on “Home page”:

This is where you get to write a few sentences about your store:

After you’re done with all that, you can save your homepage and see what your store looks like. At any stage, you can do that by clicking this icon:

Important things not to forget

Right now, you’re done with the basic process and you’ve learned how to set up your eCommerce store on Shopify. But there’s still a couple of required steps to get your store officially ready for customers.

Selecting a domain on Shopify

By default, your store gets a subdomain, like:

You can upgrade to a standard, custom domain right through Shopify. Just click the “Buy new domain” button, and Shopify will guide you through the process:

Enabling online payments within Shopify

Before your customers can buy anything, you need to integrate your store with an online payments gateway. This can be done in Settings / Payments.

By default, Shopify comes integrated with PayPal, it lets you accept credit cards through various other gateways, and also provides you with a handful of alternative payment methods.

The easiest way to get started is certainly working with PayPal. Shopify actually handles this out of the box for you. No need to adjust anything, as long as you use the same email address for your Shopify store and your PayPal. If not, you can change this in the settings by clicking the “Edit” button:

After every sale, Shopify will credit your PayPal account automatically.

How to make your Shopify store public?

The last step on your way to setting up an eCommerce store on Shopify is to make it public. To do so, go to Online Store / Overview, and click the “Unlock your store” button:

After your 14-day trial is up, you’ll have to select one of the available plans with Shopify to continue operating. The current options are:

And that’s it! Right now, your new eCommerce store on Shopify is up and running. Shopify takes care of all of your products, sales, orders, and the overall appearance of your online store on the web.

So what do you think? Willing to give Shopify a try? Or maybe you have any questions related to how to set up your eCommerce store on Shopify?

Content retrieved from: https://ecommerceguide.com/guides/setup-shopify-store/.

What Comes After Deep Learning?

Posted on August 1st, 2018

Posted by William Vorhies on March 20, 2018 at 12:55pm
https://www.datasciencecentral.com/profiles/blogs/what-comes-after-deep-learning

Summary: We’re stuck. There hasn’t been a major breakthrough in algorithms in the last year. Here’s a survey of the leading contenders for that next major advancement.

We’re stuck. Or at least we’re plateaued. Can anyone remember the last time a year went by without a major notable advance in algorithms, chips, or data handling? It was so unusual to go to the Strata San Jose conference a few weeks ago and see no new eye catching developments.

As I reported earlier, it seems we’ve hit maturity and now our major efforts are aimed at either making sure all our powerful new techniques work well together (converged platforms) or making a buck from those massive VC investments in same.

I’m not the only one who noticed. Several attendees and exhibitors said very similar things to me. And just the other day I had a note from a team of well-regarded researchers who had been evaluating the relative merits of different advanced analytic platforms, and concluding there weren’t any differences worth reporting.

Why and Where are We Stuck?

Where we are right now is actually not such a bad place. Our advances over the last two or three years have all been in the realm of deep learning and reinforcement learning. Deep learning has brought us terrific capabilities in processing speech, text, image, and video. Add reinforcement learning and we get big advances in game play, autonomous vehicles, robotics and the like.

We’re in the earliest stages of a commercial explosion based on these like the huge savings from customer interactions through chatbots; new personal convenience apps like personal assistants and Alexa, and level 2 automation in our personal cars like adaptive cruise control, accident avoidance braking, and lane maintenance.

Tensorflow, Keras, and the other deep learning platforms are more accessible than ever, and thanks to GPUs, more efficient than ever.

However, the known list of deficiencies hasn’t moved at all.

The need for too much labeled training data.
Models that take either too long or too many expensive resources to train and that still may fail to train at all.
Hyperparameters especially around nodes and layers that are still mysterious. Automation or even well accepted rules of thumb are still out of reach.
Transfer learning that means only going from the complex to the simple, not from one logical system to another.

I’m sure we could make a longer list. It’s in solving these major shortcomings where we’ve become stuck.

What’s Stopping Us

In the case of deep neural nets the conventional wisdom right now is that if we just keep pushing, just keep investing, then these shortfalls will be overcome. For example, from the 80’s through the 00’s we knew how to make DNNs work, we just didn’t have the hardware. Once that caught up then DNNs combined with the new open source ethos broke open this new field.

All types of research have their own momentum. Especially once you’ve invested huge amounts of time and money in a particular direction you keep heading in that direction. If you’ve invested years in developing expertise in these skills you’re not inclined to jump ship.

Change Direction Even If You’re Not Entirely Sure What Direction that Should Be

Sometimes we need to change direction, even if we don’t know exactly what that new direction might be. Recently leading Canadian and US AI researchers did just that. They decided they were misdirected and needed to essentially start over.

This insight was verbalized last fall by Geoffrey Hinton who gets much of the credit for starting the DNN thrust in the late 80s. Hinton, who is now a professor emeritus at the University of Toronto and a Google researcher, said he is now “deeply suspicious” of back propagation, the core method that underlies DNNs. Observing that the human brain doesn’t need all that labeled data to reach a conclusion, Hinton says “My view is throw it all away and start again”.

So with this in mind, here’s a short survey of new directions that fall somewhere between solid probabilities and moon shots, but are not incremental improvements to deep neural nets as we know them.

These descriptions are intentionally short and will undoubtedly lead you to further reading to fully understand them.

Things that Look Like DNNs but are Not

There is a line of research closely hewing to Hinton’s shot at back propagation that believes that the fundamental structure of nodes and layers is useful but the methods of connection and calculation need to be dramatically revised.

Capsule Networks (CapsNet)

It’s only fitting that we start with Hinton’s own current new direction in research, CapsNet. This relates to image classification with CNNs and the problem, simply stated, is that CNNs are insensitive to the pose of the object. That is, if the same object is to be recognized with differences in position, size, orientation, deformation, velocity, albedo, hue, texture etc. then training data must be added for each of these cases.

In CNNs this is handled with massive increases in training data and/or increases in max pooling layers that can generalize, but only by losing actual information.

The following description comes from one of many good technical descriptions of CapsNets, this one from Hackernoon.

Capsule is a nested set of neural layers. So in a regular neural network you keep on adding more layers. In CapsNet you would add more layers inside a single layer. Or in other words nest a neural layer inside another. The state of the neurons inside a capsule capture the above properties of one entity inside an image. A capsule outputs a vector to represent the existence of the entity. The orientation of the vector represents the properties of the entity. The vector is sent to all possible parents in the neural network. Prediction vector is calculated based on multiplying its own weight and a weight matrix. Whichever parent has the largest scalar prediction vector product, increases the capsule bond. Rest of the parents decrease their bond. This routing by agreement method is superior to the current mechanism like max-pooling.

CapsNet dramatically reduces the required training set and shows superior performance in image classification in early tests.

gcForest

In February we featured research by Zhi-Hua Zhou and Ji Feng of the National Key Lab for Novel Software Technology, Nanjing University, displaying a technique they call gcForest. Their research paper shows that gcForest regularly beats CNNs and RNNs at both text and image classification. The benefits are quite significant.

Requires only a fraction of the training data.
Runs on your desktop CPU device without need for GPUs.
Trains just as rapidly and in many cases even more rapidly and lends itself to distributed processing.
Has far fewer hyperparameters and performs well on the default settings.
Relies on easily understood random forests instead of completely opaque deep neural nets.

In brief, gcForest (multi-Grained Cascade Forest) is a decision tree ensemble approach in which the cascade structure of deep nets is retained but where the opaque edges and node neurons are replaced by groups of random forests paired with completely-random tree forests.

Pyro and Edward

Pyro and Edward are two new programming languages that merge deep learning frameworks with probabilistic programming. Pyro is the work of Uber and Google, while Edward comes out of Columbia University with funding from DARPA. The result is a framework that allows deep learning systems to measure their confidence in a prediction or decision.

In classic predictive analytics we might approach this by using log loss as the fitness function, penalizing confident but wrong predictions (false positives). So far there’s been no corollary for deep learning.

Where this promises to be of use for example is in self-driving cars or aircraft allowing the control to have some sense of confidence or doubt before making a critical or fatal catastrophic decision. That’s certainly something you’d like your autonomous Uber to know before you get on board.

Both Pyro and Edward are in the early stages of development.

Approaches that Don’t Look Like Deep Nets

I regularly run across small companies who have very unusual algorithms at the core of their platforms. In most of the cases that I’ve pursued they’ve been unwilling to provide sufficient detail to allow me to even describe for you what’s going on in there. This secrecy doesn’t invalidate their utility but until they provide some benchmarking and some detail, I can’t really tell you what’s going on inside. Think of these as our bench for the future when they do finally lift the veil.

For now, the most advanced non-DNN algorithm and platform I’ve investigated is this:

Hierarchical Temporal Memory (HTM)

Hierarchical Temporal Memory (HTM) uses Sparse Distributed Representation (SDR) to model the neurons in the brain and to perform calculations that outperforms CNNs and RNNs at scalar predictions (future values of things like commodity, energy, or stock prices) and at anomaly detection.

This is the devotional work of Jeff Hawkins of Palm Pilot fame in his company Numenta. Hawkins has pursued a strong AI model based on fundamental research into brain function that is not structured with layers and nodes as in DNNs.

HTM has the characteristic that it discovers patterns very rapidly, with as few as on the order of 1,000 observations. This compares with the hundreds of thousands or millions of observations necessary to train CNNs or RNNs.

Also the pattern recognition is unsupervised and can recognize and generalize about changes in the pattern based on changing inputs as soon as they occur. This results in a system that not only trains remarkably quickly but also is self-learning, adaptive, and not confused by changes in the data or by noise.

Some Incremental Improvements of Note

We set out to focus on true game changers but there are at least two examples of incremental improvement that are worthy of mention. These are clearly still classical CNNs and RNNs with elements of back prop but they work better.

Network Pruning with Google Cloud AutoML

Google and Nvidia researchers use a process called network pruning to make a neural network smaller and more efficient to run by removing the neurons that do not contribute directly to output. This advancement was rolled out recently as a major improvement in the performance of Google’s new AutoML platform.

Transformer

Transformer is a novel approach useful initially in language processing such as language-to-language translations which has been the domain of CNNs, RNNs and LSTMs. Released late last summer by researchers at Google Brain and the University of Toronto, it has demonstrated significant accuracy improvements in a variety of test including this English/German translation test.

The sequential nature of RNNs makes it more difficult to fully take advantage of modern fast computing devices such as GPUs, which excel at parallel and not sequential processing. CNNs are much less sequential than RNNs, but in CNN architectures the number of steps required to combine information from distant parts of the input still grows with increasing distance.

The accuracy breakthrough comes from the development of a ‘self-attention function’ that significantly reduces steps to a small, constant number of steps. In each step, it applies a self-attention mechanism which directly models relationships between all words in a sentence, regardless of their respective position.

A Closing Thought

If you haven’t thought about it, you should be concerned at the massive investment China is making in AI and its stated goal to overtake the US as the AI leader within a very few years.

In an article by Steve LeVine who is Future Editor at Axios and teaches at Georgetown University he makes the case that China may be a fast follower but will probably never catch up. The reason, because US and Canadian researchers are free to pivot and start over anytime they wish. The institutionally guided Chinese could never do that. This quote from LeVine’s article:

“In China, that would be unthinkable,” said Manny Medina, CEO at Outreach.io in Seattle. AI stars like Facebook’s Yann LeCun and the Vector Institute’s Geoff Hinton in Toronto, he said, “don’t have to ask permission. They can start research and move the ball forward.”

As the VCs say, maybe it’s time to pivot.

AI and the future of work

Posted on July 31st, 2018