Scraping content from authenticated SharePoint Online


As some of you may know, I run both the public and Intranet websites for Mathematica. With the Intranet sites, there are always outside dependencies to things such as HR systems, Active Directory, SharePoint, etc. The scenario I was presented with was that our on-premise SharePoint site was being moved to the cloud. SharePoint team informed me that it would be a simple URL change – but that was not the case. With SharePoint Online (cloud), you need to authenticate into root SharePoint site before gaining access to SharePoint list page. This meant a rewrite of how things were being down to scrap content from SharePoint.

Let’s provide a fake SharePoint link to parse:
https://whyamiusing.sharepoint.com/sites/SomeSharePointSite/ContentPageToScrap.aspx

NOTE: you’ll need to authenticate into the site level SharePoint site root – the full URL will not work: https://whyamiusing.sharepoint.com/sites/SomeSharePointSite

NOTE: For SharePoint online, your username must be an email address!

NOTE: ListName = ContentPageToScrap

So you authenticate to root, before drilling into the list within the site.

**CODE with notes in comments Libraries used: Microsoft.SharePoint.Client

//convert password to SecureString:
SecureString securePassWord = new SecureString();
foreach (var cc in password)
{
   securePassWord.AppendChar(cc);
}
 //create a client content for SharePoint site root
using (ClientContext clientContext = new ClientContext(SharePointURL))
{
    SharePointOnlineCredentials sharePointCredentials = new SharePointOnlineCredentials(userName, securePassWord);
    clientContext.Credentials = sharePointCredentials;
    Microsoft.SharePoint.Client.Web web = clientContext.Web;

     clientContext.Load(web);
     clientContext.ExecuteQuery();

     //Next get the list name that you want to parse: ContentPageToScrap
     List list = clientContext.Web.Lists.GetByTitle(SharePointListName);
     clientContext.Load(list);
     clientContext.ExecuteQuery();
     CamlQuery queryAll = CamlQuery.CreateAllItemsQuery();

     ListItemCollection items = list.GetItems(queryAll);
     clientContext.Load(items);
     clientContext.ExecuteQuery();
     //Next parse the listitems returned
     foreach (ListItem listItem in items)
     {
        Dictionary fieldValuesDict = listItem.FieldValues;
        string projectNumber = string.Empty;
 projectNumber = GetFieldValue(fieldValuesDict, "Project_x0020_Number");
 //Process into Enumerable or whatever else you need to return!

     }
}

private static string GetFieldValue(Dictionary fieldValuesDict, string key)
{
  string returnVal = string.Empty;
   try{
     if (fieldValuesDict.ContainsKey(key))
     {
      returnVal = (string)fieldValuesDict[key];
     }
   }catch (Exception ex){
     Sitecore.Diagnostics.Log.Warn(string.Format("Error processing KEY: {0} Ex:{1}",key,ex.ToString()), "Http");
   }
    return returnVal;
}

Comments